my PhD Thesis

Transcription

my PhD Thesis

ORSAY
d’ordre : 7774
UNIVERSITÉ PARIS–SUD
THÈSE
présentée pour obtenir
LE GRADE DE DOCTEUR EN SCIENCES
DE L’UNIVERSITÉ PARIS XI ORSAY
Spécialité : Automatique et Traitement du Signal
par
Mohamed SAHMOUDI
PROCESSUS ALPHA-STABLES POUR LA SÉPARATION
ET L’ESTIMATION ROBUSTES DES SIGNAUX
NON-GAUSSIENS ET/OU NON-STATIONNAIRES
Soutenue le 13 décembre 2004 devant le jury composé de :
Rapporteurs
Eric Moreau
Jean-Yves Tourneret
Examinateurs Jean-Pierre Delmas
Ali M.-Djafari
Jean Christophe Pesquet
Encadrant
Karim Abed-Meraim
Directeur
Messaoud Benidir
Professeur, Université de Toulon
Professeur, INP, Toulouse
Professeur, INT, Evry
Directeur de Recherche, LSS, CNRS
Professeur, Université Marne la Vallée
Enseignant-Chercheur, Telecom Paris
Professeur, Université Paris XI Orsay
2
EN
T ON N OM ET POUR T OI MON DIEU ,
L0 OMN ISCIEN T ET L0 OMN IPOT EN T
SBD sur toi Ahmad, mon modèle idéal ...
À ma famille,
À toi ”Knikina”... ♥
REMERCIEMENTS
J’aime autant le chemin parcouru que l’arrivée au but ; je considère ce manuscrit comme mémoire de thèse de doctorat, mais aussi comme une belle histoire
à raconter, une histoire d’idées et de personnes que j’espère vous faire un peu
connaı̂tre.
Je suis profondément reconnaissant aux chercheurs qui ont partagé avec moi,
non seulement les résultats de leurs travaux, mais aussi leur contexte humain.
Faire justice à tout ceux qui ont contribué à l’élaboration de ma thèse est particulièrement délicat. Je n’ai pas pu dans ce cadre limité, mentionner tous les
chercheurs qui font partie de cette histoire et qui méritent d’y trouver leur nom.
J’espère qu’ils me le pardonneront.
Je n’aurais jamais pu accomplir ce travail sans aide. Et par ordre chronologique :
Je tiens d’abord à remercier toute ma famille qui a supporté toutes les difficultés morales et matérielles pour me soutenir tout au long de mes études
supérieures. Mon père Mohammadine, ma mère Habiba, ma grande soeur Fatima et son petit Fouad, ma petite soeur Saida et mon petit frère Hafid, je les
garde bien au chaud, dans mon coeur. Ils savent combien ils comptent pour moi...
Je suis aussi particulièrement redevable à mon oncle Elhoucine Sahmoudi. Sa
confiance et son soutien m’ont beaucoup aidé à démarrer mes études de troisième
cycle.
Je remercie M. Paul Deheuvels, directeur du laboratoire de statistiques LSTA
de l’université Pierre et Marie Curie - Paris 6, et tous les enseignants du DEA de
statistique de Paris VI pour leur qualité d’enseignement et leur encadrement.
Je suis aussi redevable à Hervé Monod, chercheur au laboratoire de Biométrie
de l’INRA de Jouy-en-Josas, pour m’avoir encadrer pendant mon stage de DEA
et permis de réaliser de belles applications de la théorie dans son domaine d’Agronomie. Également, je remercie le professeur Y. Kutoyants de l’université de Le
Mans pour son encadrement lors de mon mémoire théorique du DEA sous sa
direction.
i
Je tiens à saluer toute ”la bande” d’Antony, mes amis de la résidence universitaire Jean-Zay, qui ont constitué ma deuxième famille pendant 4 ans. En
particulier, Abdelillah Sahmoudi, Zirari et sa petite famille, Ajinou, Sabri, Elhattab, Rekik, Eljazouli, Halmi, Sbai, Brich, Bajdouri, ...
Je remercie également mon directeur de thèse Prof. Messaoud Benidir, Professeur à l’université Paris-Sud d’Orsay, pour m’avoir donné l’occasion d’entrer
dans le monde fascinant de traitement du signal. Sa confiance et son soutien
m’ont beaucoup aidé à accomplir ce travail. Il a su guider ce travail avec sagesse,
tout en me laissant une grande liberté.
Je tiens à témoigner publiquement à Dr. Karim Abed-Meraim, mon directeur
scientifique de recherche, toute la reconnaissance que je lui dois. Ce dernier a
suscité, developpé, puis accompagné mes premiers pas dans le domaine du traitement du signal avec une grande patience et avec une pédagogie extraordinaire.
Le soutien moral, matériel et intellectuel de mon encadrant Dr. K. Abed-Meraim,
fut essentiel.
Non seulement il m’a fourni une aide indispensable à l’avancement de mes travaux de recherches, mais aussi, quand j’étais dans certaine situation familiale ou
financière délicate, il a su, avec un instinct infaillible, localiser mon laxisme et m’a
aidé avec ses sages conseils prodigués à reprendre le chemin. Grand professeur et
source inépuisable de nouvelles idées, il restera mon mentor,....
Mes remerciements s’adressent aussi à M. Henri Maitre, chef du département
TSI de l’ENST, de m’avoir accepté au sein de son département.
Aux membres du TREX Electronique de l’Ecole Polytechnique qui m’ont
accueilli comme moniteur, plus particulièrement Yvan Bonnassieux et Stéphane
Mallat avec qui j’ai partagé le grand plaisir d’enseigner à l’X.
Aux membre du CEREMADE de l’université Dauphine qui m’ont accueillie
comme ATER, plus particulièrement M. Bellec, C. Pardoux, C. Robert avec qui
j’ai partagé mon goût pour l’enseignement des statistiques et des mathématiques.
Je remercie sincèrement les professeurs Eric Moreau et Jean-Yves Tourneret
d’avoir accepté la lourde tâche de rapporteurs malgré le temps très restreint que
je leur ai laissé. Par leurs questions et remarques constructives, ils m’ont été
d’une aide précieuse et m’ont permis d’améliorer de manière significative certaines parties de mon manuscrit.
J’aimerais également remercier les professeurs Jean-Pierre Delmas, Ali MohammadDjafari et Jean Christophe Pesquet pour avoir accepté de juger ces quelques
années de travail en participant au jury de ma thèse et pour l’intérêt qu’ils ont
bien voulu porter à mon travail.
ii
D’autres chercheurs ont répondu à ma demande de coopération avec une
générosité extraordinaire. B. Boashash (Australie), B. Barkat (Singapore), A.
Belouchrani (Algérie) et M. Taqqu (Boston, USA) pendant leurs séjours sabbatiques au département TSI de l’ENST. L.J. Stanković (Montenegro), A. Hero
(USA), J.-F. Cardoso (France) et J. Chambers (UK) dans différentes occasions.
Ils ont partagé avec moi leur érudition. J’admire et j’apprécie non seulement
leur compétence professionnelle, mais aussi l’ingéniosité qu’ils ont mise en œuvre
pour m’expliquer certains concepts techniques et la courtoisie avec laquelle ils
ont épargné mon amour-propre de la recherche scientifique.
Je remmercie également Philippe Ciblat de Comelec-ENST, Marc Lavielle
et Estelle Kuhn de l’équipe modélisation statistique de Paris-Sud pour les nombreuses discussions scientifiques que nous avons eu lors de nos réunions dans le
cadre du projet MathSTIC.
Aux membres du LSS de Supélec et TSI de Telecom Paris et particulièrement
à Naji, Gazzah, Snoussi, Sayadi, Belkacemi, Djeddi, Khanfouci, Mohammadpour,
Hallouli, Thomas, Mouhouche, Djalil, Trung, Berriche, Souidene et Robert, j’exprime ma plus grande gratitude. L’ambiance cosmopolite qui caractérise les deux
laboratoires m’a donné envie de suivre la recherche sans frontières...
Le plaisir que j’ai eu à écrire ce rapport est largement dû à la bonté de plusieurs personnes et je ne saurais assez les remercier...
Je garde le meilleur pour la fin, ma douce et tendre Nassera. Tu m’a redonné
confiance au moment où j’en avais le plus besoin, tu m’a permis de continuer ce
travail sans jamais abondonner, tu as supporté avec une grande sagesse et une
grande patience que je travail le week-end et que je rentre tard le soir. Pour tous
cela et bien plus encore, je ne te remercierai jamais assez,... un grand merci à toi
”Knikina” !.
L’auteur
Mohamed Sahmoudi
iii
iv
Table des matières
Dédicaces
i
Remerciements
i
Table des Matières
iv
Liste des Figures
x
Liste des Tableaux
xiii
Résumé
xiv
Publications de l’Auteur
xvi
Notations et Abréviations
xviii
1 Introduction
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Non-gaussianité . . . . . . . . . . . . . . . . . . . . .
1.1.2 Non-stationnarité . . . . . . . . . . . . . . . . . . . .
1.1.3 Robustesse . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Position du Problème . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Séparation de sources impulsives à variance infinie . .
1.2.2 Estimation de signaux FM multicomposantes dans un
environnement impulsif . . . . . . . . . . . . . . . . .
1.3 Objectifs et Contributions . . . . . . . . . . . . . . . . . . .
1.4 Organisation du Document . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
3
3
4
5
7
8
8
. 9
. 11
. 13
I Outils pour le Traitement des Signaux non-Gaussiens
et/ou non-Stationnaires
17
2 Distributions Non-Gaussiennes à Queues Lourdes
19
2.1 Bref Historique . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Lois Stables Univariées . . . . . . . . . . . . . . . . . . . . . . 22
v
2.3
2.4
2.5
2.6
2.7
2.8
2.2.1 Lois indéfiniment divisibles . . . . . . . . . . . . . . . .
2.2.2 Deux définitions équivalentes des distributions α-stables
2.2.3 Stabilité de quelques lois usuelles . . . . . . . . . . . .
2.2.4 Propriétés des lois stables . . . . . . . . . . . . . . . .
2.2.5 Moments fractionnaires d’ordre inférieur . . . . . . . .
2.2.6 Simulation des lois stables . . . . . . . . . . . . . . . .
Inférence Statistique des Lois Stables . . . . . . . . . . . . . .
2.3.1 Tests de la variance . . . . . . . . . . . . . . . . . . . .
2.3.2 Estimation des paramètres des lois α-stables . . . . . .
Lois Stables Multivariées . . . . . . . . . . . . . . . . . . . . .
2.4.1 Définition et propriétés . . . . . . . . . . . . . . . . . .
2.4.2 Moments des lois stables multivariées . . . . . . . . . .
2.4.3 Vecteur aléatoire α-sous-gaussien . . . . . . . . . . . .
Mesure de Dépendance des v.a.r. α-Stables . . . . . . . . . . .
2.5.1 Covariation . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Métrique de covariation . . . . . . . . . . . . . . . . .
2.5.3 Coefficient de covariation . . . . . . . . . . . . . . . . .
2.5.4 Codifférence . . . . . . . . . . . . . . . . . . . . . . . .
2.5.5 Coefficient de covariation symétrique . . . . . . . . . .
2.5.6 Estimation des coefficients de covariation . . . . . . . .
Représentation Analytique des PDF α-Stables . . . . . . . . .
2.6.1 Développement en séries entières . . . . . . . . . . . .
2.6.2 Développement asymptotique . . . . . . . . . . . . . .
2.6.3 Approximation par un mélange fini . . . . . . . . . . .
Autres Distributions à Queues Lourdes . . . . . . . . . . . . .
2.7.1 Loi gaussienne généralisée . . . . . . . . . . . . . . . .
2.7.2 Loi normale inverse gaussienne . . . . . . . . . . . . .
2.7.3 Loi t-Student . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Robust Estimation
3.1 Robustness . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 M- Estimation . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Minimax M-estimate of location estimator . . . .
3.2.2 Influence Function . . . . . . . . . . . . . . . . .
3.2.3 M-Estimation of a deterministic signal parameter
3.2.4 Theoretical performance . . . . . . . . . . . . . .
3.2.5 Minimax optimal cost function . . . . . . . . . .
3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . .
22
23
25
27
30
31
32
32
33
35
35
37
38
39
39
40
41
43
44
44
44
44
45
45
46
46
49
51
52
.
.
.
.
.
.
.
.
53
53
54
56
57
58
59
59
60
4 Time–Frequency Concepts
4.1 Need of Time-Frequency Representation . . . . . . . . . . . .
4.2 Nonstationarity and FM Signals . . . . . . . . . . . . . . . . .
4.3 The STFT, SPEC, WVD, and Quadratic TFD . . . . . . . . .
61
61
63
66
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.4
4.5
4.6
4.7
4.8
4.9
4.10
Reduced Interference Distributions . . . . .
The WVD and Ambiguity Function . . . . .
Relationships Among Dual Domains . . . .
Time–Frequency Signal Synthesis . . . . . .
IF Estimation . . . . . . . . . . . . . . . . .
Engineering Applications of Time–Frequency
Concluding Remarks . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Methods
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
67
68
69
70
70
71
72
II Blind Separation of Impulsive Sources
with Infinite Variances
73
5 State of the Art of BSS
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 What is blind source separation (BSS) ? . . .
5.1.2 Brief history of BSS . . . . . . . . . . . . . .
5.1.3 Statistical information for BSS . . . . . . . . .
5.2 Linear Instantaneous Mixtures . . . . . . . . . . . . .
5.2.1 Separability and indeterminacies . . . . . . . .
5.2.2 How to find the independent components . . .
5.3 Basic BSS Methods . . . . . . . . . . . . . . . . . . .
5.3.1 BSS by minimization of mutual information .
5.3.2 BSS by maximization of non-gaussianity . . .
5.3.3 BSS by maximum likelihood estimation . . . .
5.3.4 BSS by algebraic tensorial methods . . . . . .
5.3.5 BSS by non-linear decorrelation . . . . . . . .
5.3.6 BSS using geometrical concepts . . . . . . . .
5.3.7 Source separation using Bayesian framework .
5.3.8 BSS using time structure . . . . . . . . . . . .
5.4 BSS of Impulsive Heavy-Tailed Sources . . . . . . . .
5.4.1 Why heavy-tailed α-stable distributions ? . . .
5.4.2 Existing BSS methods for heavy-tailed signals
5.5 Conclusion & Future Research . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
76
76
76
79
81
81
83
85
85
87
87
88
88
89
89
90
92
92
92
93
.
.
.
.
.
.
.
.
.
95
95
95
96
97
97
98
101
102
102
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Minimum Dispersion Approach
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 The failure of second and higher -order methods
6.1.2 Fractional lower-order statistics (FLOS) theory
6.2 Source Separation Procedure . . . . . . . . . . . . . . .
6.2.1 Whitening by normalized covariance matrix . .
6.2.2 Minimum dispersion criterion . . . . . . . . . .
6.2.3 Separation algorithm : Jacobi implementation .
6.3 Performance Evaluation & Comparison . . . . . . . . .
6.3.1 Generalized rejection level index . . . . . . . . .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6.4
6.3.2 Experimental results . . . . . . . . . . . . . . . . . . . 102
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 107
7 Sub- and Super- Additivity based Contrast Functions
7.1 BSS Using Contrast Functions . . . . . . . . . . . . . . .
7.2 On contrast functions . . . . . . . . . . . . . . . . . . . .
7.3 Orthogonality constraint . . . . . . . . . . . . . . . . . .
7.4 Sub-Additivity based Contrast Functions . . . . . . . . .
7.4.1 Lp -norm contrast functions ; p ≥ 1 . . . . . . . .
7.4.2 Alpha-stable scale contrast function . . . . . . .
7.5 Super-Additivity based Contrast Functions . . . . . . . .
7.5.1 Dispersion contrast function . . . . . . . . . . . .
7.6 Jacobi-Gradient Algorithm for Prewhitened BSS . . . . .
7.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
. 109
. 109
. 111
. 111
. 113
. 113
. 114
. 115
. 116
. 117
8 Normalized HOS-based Approaches
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Normalized Statistics of Heavy-Tailed Mixtures . . . .
8.2.1 Normalized moments . . . . . . . . . . . . . . .
8.2.2 Normalized second and fourth order cumulants .
8.3 Normalized Tensorial BSS Methods . . . . . . . . . . .
8.3.1 Separation algorithms . . . . . . . . . . . . . .
8.3.2 Performance evaluation & comparison . . . . .
8.4 Normalized Non-linear Decorrelation BSS Methods . .
8.4.1 Robust composite criterion for source separation
8.4.2 Iterative quasi-Newton implementation . . . . .
8.4.3 Performance evaluation & comparison . . . . .
8.5 Concluding Remarks . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
119
119
120
120
121
121
121
123
125
126
127
128
131
9 A Semi-Parametric Maximum Likelihood Approach
9.1 The Likelihood of the BSS Model . . . . . . . . . . . .
9.1.1 Derivation of the likelihood . . . . . . . . . . .
9.1.2 Sources density estimation . . . . . . . . . . . .
9.1.3 Optimization via the EM algorithm . . . . . . .
9.2 Semi-Parametric Source Separation . . . . . . . . . . .
9.2.1 Noisy linear instantaneous mixtures. . . . . . .
9.2.2 The proposed approach. . . . . . . . . . . . . .
9.2.3 Density estimation by B-spline approximations .
9.2.4 The SAEM algorithm . . . . . . . . . . . . . . .
9.3 Performance evaluation & comparison . . . . . . . . . .
9.3.1 Some existing BSS methods . . . . . . . . . . .
9.3.2 Parametric versus semi-parametric approaches .
9.3.3 Computer simulation experiments . . . . . . . .
9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
133
133
134
134
135
135
136
137
138
139
139
140
140
143
viii
III Separation and Estimation of Multicomponent
FM Signals affected by Heavy-Tailed Noise
145
10 State of the Art
10.1 Modern Spectral Analysis Approaches . . . . . . .
10.2 Time-Frequency Analysis Approaches . . . . . . . .
10.2.1 IF estimation using time-frequency methods
10.2.2 Analysis of noisy multicomponent signals . .
10.3 Robust time-frequency analysis . . . . . . . . . . .
10.4 Concluding Remarks . . . . . . . . . . . . . . . . .
11 Robust Parametric Approaches
11.1 Introduction-Problem Statement . . . . . . .
11.2 Polynomial-Phase Transform of FM Signals
11.3 IF Estimation Procedure of FM Signals . . .
11.4 Robust Subspace Estimation . . . . . . . . .
11.4.1 TRUNC-MUSIC algorithm . . . . . .
11.4.2 FLOS-MUSIC algorithm . . . . . . .
11.4.3 ROCOV-MUSIC algorithm . . . . . .
11.5 Performance Evaluation & Comparison . . .
11.5.1 Mixture of sinusoidal component . .
11.5.2 Mixture of two chirps . . . . . . . . .
11.6 Concluding Remarks . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12 Robust Time-Frequency Approaches
12.1 Introduction-Problem Statement . . . . . . . . . . . . . . .
12.2 Failure of Standard TFD in Impulsive Noise . . . . . . . .
12.2.1 Effect of impulsive spike noise on TFD . . . . . . .
12.2.2 Effect of impulsive α-stable noise on TFD . . . . .
12.2.3 The need of robust TFD in Gaussian environment .
12.3 Pre-processing Techniques based Approach . . . . . . . . .
12.3.1 Exponential compressor filter . . . . . . . . . . . .
12.3.2 Huber filter . . . . . . . . . . . . . . . . . . . . . .
12.4 Robust Time-Frequency Approach . . . . . . . . . . . . . .
12.4.1 Optimal TFD kernel in α-stable noise . . . . . . . .
12.4.2 A new robust quadratic time-frequency distribution
12.5 IF Estimation & Component Separation . . . . . . . . . .
12.6 Performance Evaluation & Comparison . . . . . . . . . . .
12.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . .
147
147
148
150
150
151
152
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
. 153
. 155
. 155
. 156
. 156
. 157
. 158
. 159
. 159
. 161
. 162
.
.
.
.
.
.
.
.
.
.
.
.
.
.
163
. 163
. 165
. 165
. 166
. 167
. 167
. 167
. 169
. 169
. 169
. 170
. 172
. 173
. 178
13 Conclusions et Perspectives
179
13.1 Conclusion Générale . . . . . . . . . . . . . . . . . . . . . . . 179
13.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Références Bibliographiques
185
ix
x
Table des figures
1.1
1.2
2.1
2.2
2.3
2.4
4.1
4.2
4.3
4.4
Réalisations d’un signal Gaussien et celles d’un signal α-stable.
• Figures (c) et (d) : Lorsque la taille de l’échantillon est relativement petite, les deux réalisations de la loi gaussienne et
de la loi α-stable sont semblables.
• Figures (a) et (b) : Lorsque la taille de l’échantillon est relativement large, les deux réalisations se diffèrent clairement . 5
Exemples de signaux non-stationnaires.
(a–c) Représentent des signaux d’applications de la vie réelle
par la B-distribution : (a) pour un signal de Baleines, (b)
pour un signal electroencephalogram et (c) pour un signal de
Chauve-souris. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Réalisations de signaux α-stables pour différentes valeurs de α.
Densité de probabilité α-stables pour différentes valeurs de α.
Les queues de la densité de probabilité α-stable pour différentes
valeurs de α. . . . . . . . . . . . . . . . . . . . . . . . . . . .
La densité de probabilité de la loi N IG(α, 0, 1, 0) pour différent
valeurs de α. . . . . . . . . . . . . . . . . . . . . . . . . . . .
(a) : Time-domain and (b) : frequency-domain representations
of an LFM signal. It shows clearly the inherent limitation of
classical representations of a non-stationary signal. . . . . .
A TF representation of the LFM signal in 4.1. . . . . . . . .
Examples of nonstationary signals.
An engineering application is shown in (a) for a linear FM
signal (plotted using the Wigner–Ville distribution). Real–life
applications are shown in (b–d) for a whale signal, an electroencephalogram signal, and a bat signal, respectively (all
plotted using the B distribution). . . . . . . . . . . . . . . .
Quadratic representations corresponding to the WVD.
Wz (t, f ), Az (τ, ν), Kz (t, τ ) and Dz (ν, f ) are respectively the
WVD, AF, time–lag signal kernel and the Doppler–frequency
signal kernel of the analytic signal z(t). . . . . . . . . . . . .
xi
25
28
30
50
. 62
. 63
. 65
. 69
4.5
Dual domains of general signal quadratic representations.
γ(t, f ), Γ (τ, ν), G(t, τ ) and G(ν, f ) are the TFD time–frequency,
Doppler–lag, time–lag and Doppler–frequency kernel, respectively. ρz (t, f ) and Az (τ, ν) are the general quadratic TFD and
the GAF of the analytic signal z(t). . . . . . . . . . . . . . . . 69
5.1
5.2
Signal model for the blind source separation problem . . . . . 76
Order of Statistics in Blind Source Separation . . . . . . . . . 79
6.1
Extraction of 3 α-stable sources from 3 observations where
α = 0.5, and N = 10000. . . . . . . . . . . . . . . . . . . . . . 103
Generalized mean rejection level versus α where N = 1000. . . 103
Generalized mean rejection level versus the estimation error ∆α.104
Generalized mean rejection level versus sample size N. . . . . . 105
Generalized mean rejection level versus sample size for α = 1.5.106
Generalized mean rejection level versus the additive noise power for α = 1.5. . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2
6.3
6.4
6.5
6.6
8.1
8.2
8.3
8.4
Generalized mean rejection level versus the noise power. .
Generalized mean rejection level versus the sample size. .
Generalized mean rejection level versus the sample size. . .
Mean rejection level versus the noise power with T = 1000.
9.1
9.2
9.3
Consistency of different BSS algorithms. The sample sizes were
1000 for case (1) and 5000 for case (2). . . . . . . . . . . . . . 141
The performance index versus noise level. . . . . . . . . . . . . 142
The performance index versus sample size. . . . . . . . . . . . 142
11.1
11.2
11.3
11.4
The
The
The
The
MSE
MSE
MSE
MSE
versus
versus
versus
versus
the
the
the
the
noise dispersion in dB, N=1000.
sample size, γ = 0.1. . . . . . .
sample size, γ = 0.1. . . . . . .
noise dispersion in dB, N=1000.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12.1 The nonlinear law of the compressor used in the pre-processing
stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Compression of a linear FM signal in impulsive noise using different values of β. . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 The standard MBD of the multi-component signal test. . . . .
12.4 The Robust-MBD of the multi-component signal test. . . . . .
12.5 The NMSE versus sample size : a comparative study. . . . . .
12.6 NMSE of IF estimates, corresponding to the HAF, r-PWVD
and the R-MBD for a noisy two-component chirp signal. . . .
12.7 Normalized MSE of the various phase parameters versus sample
size, γ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.8 Normalized MSE of the various phase parameters versus noise
dispersion in dB, N=1000. . . . . . . . . . . . . . . . . . . . .
xii
124
124
129
131
160
160
161
162
168
168
174
174
175
176
177
178
Liste des tableaux
2.1
2.2
2.3
2.4
2.5
La moyenne et la variance des lois α-stables pour différentes
valeurs de α . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test graphique de la variance en utilisant la variance empirique.
Test graphique de la queue d’une distribution par la méthode
dite ”log-log”. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Valeurs optimales de K en fonction de n et de α . . . . . . . .
Approximation de la PDF SαS par le modèle de mélange de
gaussiennes et affinage de l’approximation par l’algorithme EM.
32
33
33
35
47
4.1
Some common TFD and their kernels. . . . . . . . . . . . . . 68
6.1
The principal steps of the proposed minimum dispersion (MD)
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1
8.2
The principal steps of the proposed Robust-JADE algorithm. . 122
The principal steps of the proposed Robust-EASI algorithm. . 128
11.1 The proposed frequency estimation TRUNC-MUSIC algorithm.156
11.2 The proposed robust frequency estimation FLOS-MUSIC algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.3 The proposed robust covariance estimation ROCOV algorithm. 159
11.4 The proposed frequency estimation ROCOV-MUSIC algorithm.159
12.1 Computation procedure of the Robust-MBD. . . . . . . . . . . 171
12.2 Component separation procedure for the proposed algorithm . 173
xiii
xiv
RÉSUMÉ
L’objectif principal de ce travail de thèse est de développer de nouvelles
techniques robustes pour le traitement des signaux non-gaussiens et/ou nonstationnaires dans des environnements impulsifs. Plus précisément, le travail
de cette thèse de doctorat se situe au carrefour des deux problématiques suivantes :
I- Séparation aveugle de mélanges linéaires de sources impulsives : Ce
problème a été peu étudié pour certains cas statistiquement ardus. En effet,
lorsque les sources sont modélisées par des lois α-stables, les méthodes classiques ne s’appliquent plus, car la densité de probabilité n’a pas d’expression
analytique explicite et les moments d’ordre 2 ou d’ordre supérieur à 2 sont
infinis. Dans ce cas, nous avons introduit quatres approches originales :
– Une approche basée sur le critère de dispersion minimum qui consiste à
minimiser la somme des dispersions des observations blanchies. L’étape
de pré-blanchiment des observations est basée sur une nouvelle matrice
de covariance normalisée que nous avons introduite.
– Une deuxième approche basée sur l’idée des statistiques normalisées
que nous avons introduite pour adapter les méthodes existantes basées
sur les statistiques d’ordre deux ou d’ordre supérieur.
– Une troisième approche en utilisant des fonctions de contrastes, sous
contrainte d’orthogonalité, basées sur des fonctionnelles sous- ou suradditives. En particulier, nous avons proposé un critère qui consiste
à minimiser la somme des normes Lp (p ≥ 1) des observations pour
séparer des sources qui peuvent être éventuellement à variance infinie.
– Une quatrième approche de structure semi-paramétrique. Dans cette
méthode nous formulons le problème de séparation de source sous forme
d’un problème d’estimation par le principe du maximum de vraisemblance. Par la suite, nous combinons une version stochastique de l’algorithme EM et l’approximation des PDF α-stables par les fonctions logspline afin d’estimer la PDF et la matrice du mélange simultanément.
xv
II- Estimation de signaux FM non-stationnaires multicomposantes dans
un environnement impulsif : La littérature reste relativement pauvre dans
le cas multicomposantes en présence de bruit impulsif α-stable. Pour contribuer à la résolution de ce problème, nous avons proposé des méthodes paramétriques et d’autres non-paramétriques basées sur l’analyse temps-fréquence :
– Méthodes paramétriques : Nous commençons par ramener le problème
à celui de l’estimation de signaux harmoniques noyés dans un bruit
impulsif grâce à une transformée polynomiale du signal. Une méthode
haute résolution (de type MUSIC) est alors appliquée au signal ainsi
transformé pour l’estimation des paramètres de la phase. Trois cas de
figures sont considérés et comparés : (i) Celui de l’application directe
de l’algorithme MUSIC au signal harmonique tronqué ; (ii) celui de
l’application de l’algorithme MUSIC à l’estimée robuste de la fonction
de covariance du signal harmonique et (iii) celui de l’application de
MUSIC à la covariation généralisée du signal.
– Méthodes non-paramétriques : Dans une première approche, nous avons
appliqué la procédure de robustesse au sens minimax d’Huber contre
l’effet du bruit impulsif sous forme d’une étape de pré-traitement par
deux techniques différentes à savoir : (i) technique de compression
des amplitudes par un filtre non-linéaire de type |x|β ; 0 < β < 1 et
(ii) la technique de troncature du signal en amplitude. Par la suite
nous représentons le signal dans le plan temps-fréquence en utilisant
des transformées quadratiques adéquates au cas multicomposantes et
un algorithme de séparation pour extraire les composantes et estimer
leurs fréquences instantanées. Par contre, dans la deuxième approche,
nous avons appliqué la méthode M-estimation robuste directement à la
transformée temps-fréquence quadratique pour définir une transformée
robuste à l’effet du bruit impulsif et des termes croisés d’un signal multicomposantes.
Finalement, une étude numérique vient compléter les résultats théoriques
et permet de comparer nos approches à d’autres méthodes existantes dans la
littérature.
xvi
Publications
1- Articles de Journals
1. M. Sahmoudi, H. Monod, D. Makowski et D. Wallach,” Optimal experimental designs for estimating model parameters, applied to yield response to
nitrogen models,” Agronomie, vol. 22, pp. 229–238, 2002.
2. M. Sahmoudi, K. Abed-Meraim and M. Benidir, “Blind Separation of Impulsive alpha-stable Sources Using a Minimum Dispersion Criterion”, in IEEE
Signal Processing Letters Journal, vol. 12, No.4, April, 2005.
3. M. Sahmoudi and K. Abed-Meraim ”Blind Separation of Instantaneous Mixtures of Impulsive α-Stable Sources based on Fractional Lower-Order Statistics”, submitted for publication in IEEE Transaction on Signal Processing.
4. M. Sahmoudi and K. Abed-Meraim ”Blind Separation of Heavy-Tailed Sources
Using Normalized Statistics”, submitted for publication in IEEE Transaction
on Signal Processing.
5. M. Sahmoudi, K. Abed-Meraim and B. Barkat ”Robust Estimation of Multicomponent Non-Stationary FM Signals in Heavy-Tailed Noise”, submitted
for publication in IEEE Transaction on Signal Processing.
2- Articles de Conferences
1. M. Benidir and A. Ouldali and M. Sahmoudi, ”Performances Analysis For
The HAF-Estimator For A Time-Varying Amplitude Phase-Modulated Signals,” in Proceeding of CA 2002, IASTED International Conference in
Control and Applications, Cancun, Mexico, May 20-22, 2002.
2. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ” Blind Separation of
Alpha-Stable Sources : A new Fractional Lower-Order Moments (FLOM)
Approach”, in Proceeding of the ISSPIT02 ; the IEEE International Symposium on Signal Processing and Information Technology, December Marrakech, Morocco, December 18-21, 2002.
3. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Estimation de Signaux
Chirp Multicomposantes Affecte par un Bruit Impulsif Alpha-stable”, in
Proceeding of GRETSI Paris, France, Septembre 2003.
4. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Blind Separation of Instantaneous Mixtures of Impulsive alpha-stable Sources”, in Proceeding of the
xvii
5.
6.
7.
8.
9.
10.
11.
12.
13.
IEEE International Symposium on Signal and Image Processing and Analysis, Rome, Italy, September 2003.
M. Sahmoudi, K. Abed-Meraim, N. Linh-Trung, V. Sucic, F. Tupin and B.
Boashash, ”An Image and Time-frequency Processing Methods for Blind
Separation of Non-stationary Sources”, in Proc. of Journées d’Etude sur
les Méthodes pour les Signaux Complexes en Traitement d’Image, INRIA
Recquencourt, Paris France, 9-10 décembre 2003.
M. Sahmoudi and K. Abed-Meraim, ”Multicomponent Chirp Interference
Estimation For Communication Systems In Impulsive alpha-stable noise Environment”, in Proceeding of the IEEE International Symposium on Control,
Communications and Signal Processing, Hammamet, Tunisia, Mars 2004.
M. Sahmoudi and K. Abed-Meraim, ”Robust IF Estimation of Multicomponent FM Signals Affected by Heavy-Tailed Noise Using TFD”, Int. Colloquium of Modelization, Stochastic and Statistics MSS-2004, Alger, Algeria,
Avril 2004.
M. Sahmoudi, K. Abed-Meraim and B. Barkat, ”IF Estimation of Multicomponent Chirp Signal in Impulsive alpha-stable noise Environment Using
Parametric and Non-Parametric Approaches”, in Proceeding of EUSIPCO
2004, 12th European Signal Processing Conference, Vienna, Austria, September 2004.
M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Blind Separation of HeavyTailed Signals Using Normalized Statistics”, in Proceeding of ICA 2004. 5th
International Conference on Independent Component Analysis and Blind
Source Separation, Granada, Spain, September 22-24, 2004.
M. Sahmoudi and K. Abed-Meraim, ”Robust Blind Separation Algorithms
for Heavy-Tailed Sources”, to appear in Proceeding of ISSPIT’2004 ; the
fourth IEEE Symposium on Signal Processing and Information Technology,
Rome, Italy, December 18-21 2004.
M. Sahmoudi, K. Abed-Meraim, M. Lavielle, E. Kuhn and Ph. Ciblat, ”Blind
Source Separation Using a Semi-Parametric Approach with Application to
Heavy-Tailed Signals” submitted to EUSIPCO’2005, Turkey, Sep. 2005.
M. Sahmoudi and K. Abed-Meraim, ”A Robust Time-frequency Distribution
for the Analysis of Multicomponent Non-stationary FM Signals Affected by
Impulsive α-stable Noise”, submitted to SSP’2005, Bordeaux, France, July
2005.
M. Sahmoudi and K. Abed-Meraim ”Blind Sources Separation Using Contrast
Functions based on Some sub- and super- Additive Functionals”, submitted
to ISSPA’2005, Sydney, Australia, Sept. 2005.
xviii
1
Notations et abréviations
Tous au long de ce document, les notations et les abréviations classiques suivantes seront utilisées :
diag(a1 , · · · , an )
i.i.d.
v.a.
v.a.r.
BD
EEG
EVD
FM
FT
FLOS
HOS
IF
IFT
LFM
MBD
PDF
SNR
SOS
TFSP
TF
TFD
WVD
Matrice diagonale d’élements diagonaux a1 , · · · , an
indépendants et identiquement distribués
variable aléatoire
variable aléatoire réelle
B distribution
electroencephalogram
eigenvalue decomposition
frequency–modulated
Fourier transform
fractional lower-order statistics
higher–order statistics
instantaneous frequency
inverse Fourier transform
linear frequency–modulated
modified B distribution
probability density function
signal–to–noise ratio
second–order statistics
time–frequency signal processing
time–frequency
time–frequency distribution
Wigner–Ville distribution
M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
Signaux non-Gaussiens et/ou non-Stationnaires
2
Chapitre 1
Introduction
Ce chapitre introductif a une double finalité. La première est de préciser le
cadre de la thèse et les deux problèmes au quelle a tenté de résoudre. Tandis
que la seconde consiste à présenter les principales contributions de ce travail en
indiquant le fil directeur reliant ses deux parties.
1.1
Motivations
Ce travail trouve son origine et sa motivation dans le besoin croissant de caractériser, d’analyser et de traiter des signaux non-stationnaires [Suppappola(2003)]
et/ou non-gaussiens [Wegman et al.(1989)], [Kassam(1995)].
Le développement des méthodes de traitement du signal a donné naissance à un
ensemble de techniques dont l’objectif principal est d’éclairer une situation d’application donnée. Avec la complexification des situations réelles, par exemples rupture
de transmission [Tourneret(1998)], phénomènes impulsifs [Nikias et Shao(1995)],
panne de capteurs [Kassam(1995)], canal non stationnaire [Ikram et al.(1998)], signal non stationnaire, effet Doppler, besoin d’instruments de mesure de plus en
plus fins,... etc, les outils de traitement du signal se spécialisent et deviennent
moins flexibles : pour s’adapter à une situation particulière, les procédures d’étude
et d’analyse doivent être modifiées fréquemment.
Lors de sa pratique professionnelle, le traiteur du signal a l’occasion de constater
qu’il est parfois très éloigné du cadre théorique strict dans lequel certains outils
ou méthodes de traitement du signal fonctionnent. Il est confronté à des données
manquantes, erronées, incomplètes, tronquées, l’hypothèse de normalité n’est pas
vérifiée, l’hypothèse de stationnarité n’est pas vérifiée,..etc. Face à ces difficultés,
il ne dispose en général que de sa propre expérience et, guidé également par son intuition, il essaye de ”façonner” de façon empirique des outils adaptés au problème
4
Introduction
qui lui est soumis.
Le traiteur du signal se trouve donc souvent dans l’obligation de choisir entre plusieurs clés pour ouvrir une serrure, dont aucune ne correspond exactement à la
serrure en question. Afin de le guider, la statistique mathématique a établi les propriétés de telle ou telle méthode dans un contexte bien spécifié, en général décrit
par un modèle probabiliste donné. Cette modélisation n’est qu’une représentation
quelque peu simplifiée de la réalité du phénomène étudié. En effet, le recours à la
loi normale n’est parfois que la conséquence d’un acte de foi ou la reconnaissance
de l’impossibilité de trouver le ”vrai” mécanisme probabiliste engendrant les observations. En plus, on rencontre des systèmes non-stationnaires de façon quasi
permanente dans la nature du fait de la dynamique et de l’évolution rapide des
systèmes étudiés.
Plusieurs époques ont été distinguées dans l’évolution chronologique des méthodes
de traitement du signal. La présente phase ne peut être résumée si simplement,
aussi préférons-nous parler de l’époque ”des méthodes statistiques fondées sur peu
d’hypothèses”. Les applications du traitement du signal conduisent donc tout naturellement à l’étude des signaux non-gaussiens, des signaux non-stationnaires et
de la robustesse des méthodes de traitement dans ces deux classes de signaux.
1.1.1
Non-gaussianité
En effet, pendant très longtemps, le développement des méthodes statistiques
et l’étude de leurs propriétés ont été fondés essentiellement sur la gaussianité1 de
la famille de lois. Cela transparaı̂t clairement, par exemple, dans toute l’approche
de R. Fisher, et dans la méthode des moindres carrés. Néanmoins, le choix d’un
modèle statistique régi par la loi normale relève plus de l’acte de foi que d’une
réflexion rigoureuse.
Sur le plan théorique, le développement récent de la statistique mathématique est
dominé par la recherche de solutions dans un contexte où la validité d’un modèle
n’est pas assurée, et où seront faites des hypothèses limitées sur la loi de probabilité.
Sur le plan pratique, dans de nombreux problèmes de communication, tels que la
transmission sur le réseau électrique, les communications HF ou bien les communications sous-marines [Grigoriu(1995)], [Kassam(1995)], [Nikias et Shao(1995)],
l’hypothèse classique sur la nature gaussienne du bruit, justifiée par le biais du
théorème central-limite, n’est plus valide. En effet, dans de tels systèmes, des
bruits à faible probabilité d’apparition mais à très fortes amplitudes dits de nature
impulsive, ou des problèmes de discontinuité du comportement du bruit (problème
de rupture) interviennent et ne peuvent plus être représentés par des lois gaussiennes. De tels phénomèmes peuvent en fait être modélisés à l’aide de distributions
non-gaussiennes à décroissance algébrique, c’est-à-dire, en x−α avec 0 < α < 2
1
Pour faire référence à Carl Friedrich Gauss : né en 1777 à Runswick en Allemagne. Il
devient très rapidement un astronome et mathématicien renommé, si bien qu’actuellement
il est toujours considéré comme l’un des plus grands mathématiciens de tous les temps,
au même titre qu’Archimède et Newton. Ses contributions à la science et en particulier
à la statistique sont de très grande importance. On lui doit notamment la méthode des
moindres carrés et le développement de la loi normale pour les problèmes d’erreurs de
mesure.
1.1 Motivations
5
[Nikias et Shao(1995)], [Ilow(1995)], [Kuruoglu(1998)] ayant ainsi le même comportement que les distributions α-stables. C’est la raison pour laquelle nous nous
intéressons à des modèles de distributions généraux incluant certes les modèles
gaussiens mais aussi des lois de type queue lourdes.
Pour illustration, le modèle gaussien est bien approprié dans le cas des données
à bande limité. Tandis que dans le cas des données à large bande, un modèle stable
à variance infinie doit être utilisé comme il est présenté dans la figure 1.1.
Signal α−stable: α=1.2
Signal Gaussien
4
40
20
Signal SαS(t)
Signal G(t)
2
0
0
−20
−40
−60
−2
−80
−4
0
50
100
150
200
−100
0
50
(a) Temps t
100
150
200
30
40
(b) Temps t
3
4
2
1
Signal SαS(t)
Signal G(t)
2
0
−1
−2
0
−2
−3
−4
0
10
20
30
40
−4
0
10
20
(d) Temps t
(c) Temps t
Fig. 1.1: Réalisations d’un signal Gaussien et celles d’un signal α-stable.
• Figures (c) et (d) : Lorsque la taille de l’échantillon est relativement petite,
les deux réalisations de la loi gaussienne et de la loi α-stable sont semblables.
• Figures (a) et (b) : Lorsque la taille de l’échantillon est relativement large,
les deux réalisations se diffèrent clairement
1.1.2
Non-stationnarité
De tous les outils dont on peut disposer en traitement du signal, l’analyse
spectrale est certainement l’un des plus importants. Les raisons de son excellence
sont évidemment à chercher dans la relative universalité du concept majeur sur
lequel elle repose : celui de fréquence. Que ce soit dans des domaines s’intéressant à
des ondes physiques (acoustique, vibrations, géophysique, optique,...) ou reposant
6
Introduction
sur certaines périodicités d’événements (économie, biologie, astronomie,...) une
description fréquentielle est souvent à la base d’une plus grande intelligence des
phénomènes mis en jeu, en fournissant un complément indispensable à la seule description temporelle (sortie de capteur ou suite d’événements), qui est généralement
première pour l’analyse. Si l’on ajoute à cela que l’approche fréquentielle s’accommode aussi d’une transposition au traitement spatial (imagerie acoustique, radioastronomie,...), on comprend qu’un très grand nombre d’études aient été et soient
encore consacrées à l’analyse spectrale. On dispose ainsi aujourd’hui d’un arsenal
de méthodes dont, au moins pour les plus simples et les plus robustes (et donc les
plus éprouvées), les propriétés sont bien connues. À ces méthodes s’ajoutent une
batterie d’algorithmes, de logiciel, de processus, voir d’appareils, autant d’élements
assurant à l’analyse spectrale une place de choix dans la vie quotidienne des laboratoires.
Cependant, c’est l’expérience de cette même vie quotidienne qui nous contraint à
fixer des limites de validité, mais surtout à présenter des objections de principe, à
la notion classique... La non-stationnarité est une non-propriété. Pour la définir,
on va d’abord expliquer ce qu’est la stationnarité [Flandrin(1993)]. La notion de
stationnarité est reliée naturellement à celles de régime établi, de stabilité temporelle. La définition utilisée en théorie du signal formalise d’une certaine manière ces
idées. Si on parle de signaux certains, on pourra dire d’eux qu’ils sont stationnaires
s’ils peuvent se décomposer en une somme d’ondes sinusoı̈dales éternelles (on retrouve ici les modes de Fourier du physicien). Les signaux aléatoires stationnaires
sont ceux pour lesquels il n’existe pas d’origine temporelle. Par conséquent, leurs
propriétés statistiques (leurs moments) ne varient pas au cours du temps.
Est non-stationnaire, tout ce qui n’est pas stationnaire : les transitoires, i.e.,
lorsqu’on n’est pas encore parvenu à un régime permanent (par ex. dans une voiture, phase d’accélération avant d’atteindre une vitesse stable), les ruptures, i.e,
les modifications brutales et intempestives d’amplitude (par ex. : dans une voiture,
panne de moteur, coup de frein brutal). La classe des signaux non-stationnaires
comprend une grande variété de signaux comme la sous classe des signaux modulés
en fréquence appelés signaux FM [Amin(1992)], [Cohen(1995)] et en particulier les
signaux à phase polynomiale que l’on rencontre souvent en télécommunications, notamment dans les signaux de type radar ou sonar [Ouldali(1999)], [Boashash(2002)].
Pour le traitement des signaux non stationnaires, au-delà des méthodes spectrales ”classiques” adaptées aux situations stationnaires, les années quatre-vingt
ont vu le développement d’un grand nombre d’approches ”modernes” qui ont
toutes un point commun : la prise en compte explicite du temps comme paramètre
de description. Dans un context d’analyse spectrale, ceci a conduit naturellement
au concept d’analyse temps-fréquence et à ses représentations et/ou modélisation
associées.
L’intensification des travaux sur le sujet et leur floraison dans des directions souvent différentes a certainement rendu assez difficile notre tâche de choix et puis
d’utilisation des méthodes existantes.
La figure 1.1.2, représente quelques signaux réels dans le plan temps-fréquence.
Contrairement au deux représentations classiques (temporelle et fréquentielle) d’un
1.1 Motivations
7
signal, on voit clairement l’évolution de la fréquence au cours du temps, d’où le
besoin de telle représentation pour l’analyse des signaux non-stationnaires.
1.1.3
Robustesse
Le XIX-ème siècle a vu un long débat sur le traitement des points aberrants
(”outliers”), et apparaı̂tront très tôt dans la littérature statistique des références à
cette dérive particulière des hypothèses de base. Le terme ”robuste” a été cité pour
la première fois dans un article de G. Box [Box(1953)] sur l’estimation de variance
dans le cas non-gaussien, au sens de résistance à une déviation par rapport à la
loi normale. Par la suite, de nombreux auteurs se sont intéressés aux propriétés de
certaines alternatives aux estimateurs classiques, dans le cadre de lois contaminées,
ou de mélanges de lois : on dit qu’une loi P est contaminée par une loi Q au taux
², ² ∈ [0, 1], si la loi des observations est (1 − ²)P + ²Q.
Diverses définitions du concept de robustesse ont été avancées dans la littérature
statistique [Launer et Wilkinson(1979)], [Huber(1981)], [Leroy(1987)]. Lorsque P
désigne la loi du modèle statistique, une procédure a été qualifiée de robuste si :
– elle admet une grande efficacité absolue pour toutes les alternatives à P ;
– elle admet une grande efficacité absolue sur un ensemble bien spécifié de lois ;
– elle est peu sensible à l’abandon des hypothèses statistiques sur lesquelles
elle est fondée ;
– la loi de la statistique sur laquelle est basée cette procedure doit ”peu varier”
lorsque P est soumise à de petites altérations.
Ces différentes définitions ne fournissent pas une vue exhaustive de la question,
mais proviennent toutes du même esprit. Ainsi P. Huber, dans [Huber(1972)],
écrit : ”la robustesse est une sorte d’assurance : je suis prêt a payer une perte
d’efficacité de 5 à 10 % par rapport au modèle idéal pour me protéger de mauvais
effets de petites déviations de celui-ci : je serai bien sûr heureux que ma procédure
statistique fonctionne bien sous de gros écarts, mais je n’y prête pas réellement
attention car faire de l’inférence à partir d’un modèle aussi faux n’a que peu de
signification concrète”.
En conclusion, nous retiendrons la définition suivante du concept de robustesse :
une procédure statistique sera robuste si ses performances sont peu modifiées par
de faibles modifications des hypothèses statistiques sur lesquelles elle est fondée
comme par exemple la loi P , modèle des observations. Cette approche est celle
de [Huber(1981)] et [Hampel et al.(1986)]. Cette définition de la robustesse sera
précisée -dans chacune des deux parties de cette thèse- sur deux points :
– quelles performances de la procédure faut-il retenir ?
– qu’est-ce qu’une faible modification du modèle de base ?
Plus généralement, ce problème consiste à se placer non pas dans le cadre d’une
seule loi de probabilité, mais dans une vaste classe de lois. Cette approche permet
de répondre à de nombreuses questions auxquelles le traitement du signal classique
n’apporte de solutions que dans un contexte gaussien [Lecoutre et Tassi(1980)].
Par exemple, la séparation des composantes de sources dans un environnement nongaussien impulsif [Sahmoudi et al.(2004a)], [Zhang et Kassam(2004)], l’estimation
8
Introduction
des paramètres d’un signal éventuellement non-stationnaire noyé dans un bruit
non-gaussien [Friedmann et al.(2000)], [Sahmoudi et al.(2004b)] et la détection multiutilisateurs dans un environnement non-gaussien [Poor et Tanda(2002)].
1.2
Position du Problème
Réduire l’effet du bruit additif et séparer des mélanges de sources sont deux
problèmes fondamentaux et récurrents dans la plupart des applications en traitement du signal et de l’image [Kay(1998b)], [Hyvarinen et al.(2001)]. C’est d’autre
part deux problèmes théoriques centraux en théorie statistique que ce soit pour
des objectifs d’estimation ou de détection [Kay(1998a)], [Kay(1998b)].
Le cas où le signal, bruit ou signal source, est de nature impulsive s’avère
particulièrement intéressant à la fois sur le plan théorique comme l’inférence statistique des processus α-stable [Samorodnitsky et Taqqu(1994)] et sur le plan pratique comme la réduction de l’effet du bruit atmosphérique en communications HF
et l’effet des valeurs abbérantes sur les procédures de traitement statistique d’un
signal observé [Nikias et Shao(1995)], [Kassam(1995)]. C’est aussi un cas mal ou
peu étudié dans la littérature du traitement de signal relativement au cas standard
où le signal est supposé de loi gaussienne.
Le but de ce travail est de développer des techniques d’estimation et de séparation
dans des milieux présentant des phénomènes impulsifs se caractérisant par des
processus admettant des distributions à décroissance lente, appelées également, à
queues lourdes et en particulier les distributions α-stables [Nikias et Shao(1995)],
[Adler et al.(1998)] .
1.2.1
Séparation de sources impulsives à variance infinie
La séparation aveugle de sources est une technique de traitement des signaux
(ou images) multicapteurs dans laquelle on postule qu’une séquence d’observations
x(t), t = 1, · · · , T est modélisée par
x(t) = As(t) + b(t), t = 1 . . . T
(1.1)
où A est une matrice m × n à rang plein, s(t) est un n-vecteur de sources dont les
composantes sont indépendantes et b(t) représente un éventuel bruit additif. La
séparation aveugle de sources, ou encore l’analyse en composantes indépendantes,
est un problème qui consiste à retrouver des signaux sources s(t) statistiquement
indépendants à partir de leurs observations x(t) (leurs mélanges) reçus sur le réseau
de capteurs et cela sans connaissance a priori de la structure des mélanges ou des
signaux sources [Hyvarinen et al.(2001)]. La séparation de sources intervient dans
des applications diverses telles que la localisation et la poursuite de cibles en radar et sonar, la séparation de locuteurs (problème dit de “cocktail party”), la
détection et la séparation dans les systèmes de communication à accès multiple,
l’analyse en composante indépendante de signaux biomédicaux (e.g., EEG, ECG
1.2 Position du Problème
9
et fMRI),..[Cichocki et Amari(2002)],etc.
Ce problème a été largement étudié et de nombreuses solutions ont été proposées. Il s’agit de méthodes mettant en oeuvre la minimisation d’un critère
de séparation ; certaines sont algébriques et font appel à des statistiques d’ordre
deux et/ou d’ordre supérieur [Cichocki et Amari(2002)], [Hyvarinen et al.(2001)].
D’autre utilisent des outils d’optimisation comme les algorithmes adaptatifs ou de
type bloc basés sur une décomposition parcimonieuse. D’autres encore exploitent
l’indépendance statistique des sources par le biais du principe du maximum de
vraisemblance ou encore en utilisant la théorie de l’information (principe de ”l’infomax”) [Cichocki et Amari(2002)], [Hyvarinen et al.(2001)].
Le problème de séparation d’un mélange linéaire instantané, est arrivé a une
certaine maturité mais il reste peu étudié pour certains cas statistiquement ardus.
Lorsque des observations présentent des changements brusques traduisant l’apparition d’événements significatifs modélisées par des lois α-stables, les méthodes classiques ne s’appliquent plus ou sont mal adaptées. En effet, malgré leurs différences,
il s’avère que la plupart de ces méthodes utilisent les statistiques d’ordre deux et/ou
d’ordre supérieur ou la densité de probabilité des sources ce qui est indéfini dans
le cas des sources α-stable. L’objectif de la première partie de cette thèse est de
proposer des méthodes statistiques pour la séparation de sources impulsives de
modèle α-stable. Nous nous focalisons ici sur les deux points suivants :
• Signaux sources impulsifs : Si les sources s(t) sont impulsives c’est-à-dire telles
que les probabilités des valeurs extrêmes ne sont pas négligeables, le modèle
de séparation de sources peut être plus réaliste en considérant une distribution à queue lourde comme les lois α-stables (0 < α < 2) pour modéliser les
sources. Il s’agit d’une famille paramétrique de distributions de probabilité
très flexible pour prendre en compte les caractéristiques statistiques (caractère exponentiel, symétrie, dispersion et position ) de la distribution des
observations des phénomènes à grandes variations d’échelle [Kidmose(2001)],
[Shereshevski(2002)]. Dans cette partie on traitera l’exploitation des statistiques fractionnaires d’ordre inférieure (FLOS) et l’adaptation des méthodes
existantes pour séparer des mélanges α-stables.
• Généralisation : On s’intéressera également à la généralisation de l’utilisation
des FLOS dans le problème de la séparation d’autres classes de sources.
Cela nous a permis d’introduire des techniques de séparation de sources
fondamentalement différentes de celles existantes qui ne font intervenir que
les statistiques d’ordre deux (SOS) ou d’ordres supérieurs (HOS).
1.2.2
Estimation de signaux FM multicomposantes dans
un environnement impulsif
Dans cette deuxième partie de la thèse, nous traitons les signaux multicomposantes affectés par un bruit additif non-gaussien de nature impulsive. Un signal
10
Introduction
FM est dit multicomposantes si sa representation temps-fréquence, présente des
crêtes multiples dans le plan temps-fréquence. Analytiquement, un signal est dit
multicomposantes si on peut l’écrire comme somme de signaux monocomposantes.
Le signal FM multicomposantes bruité considéré dans cette partie est donné par
le modèle suivant
x(t) = s(t) + z(t)
=
M
X
si (t) + z(t)
(1.2)
(1.3)
i=1
avec
– si (t) : désigne la i-ème composante du signal x(t). Elle est de la forme si (t) =
ai (t) ejφi (t) et elle est supposée à une seule crête seulement, ou une seule
courbe continue, dans le plan temps-fréquence.
– ai (t) : désigne l’amplitude de la i-ème composante si (t) du signal x(t).
– φi (t) : désigne la phase de la i-ème composante si (t) du signal x(t).
Lorsque la phase φi (t) est un polynôme de degré I, on dit que le signal si (t)
est un signal FM à phase polynomiale. Dans ce cas
)
( I
X
bi,k tk
(1.4)
si (t) = ai (t) exp j
k=0
– z(t) : désigne le bruit impulsif, modélisé par des lois à queue lourdes (”heavytailed”). À titre d’exemple de ce genre de lois de probabilités, que l’on utilisera pour valider nos approches, nous considérons la famille des lois α-stables
avec α < 2 [Samorodnitsky et Taqqu(1994)] et la famille des densités de probabilité des lois gaussiannes généralisées [Kay(1998a)].
Les signaux FM et en particulier les signaux à phase polynomiale (SPP) se rencontrent souvent en télécommunications, notamment dans les signaux de type radar ou sonar [Cohen(1995)], [Suppappola(2003)]. Ces signaux modélisent une vaste
gamme de signaux non-stationnaires puisqu’ils ont des caractéristiques fréquentielles
qui évoluent continuellement au cours du temps avec des vitesses de variation qui
peuvent être importantes [Boashash(2002)].
Nous nous intéressons au problème d’estimation de la fréquence instantanée de
chaque composante si (t) du signal FM (12.1), définie par [Boashash(1992a)]
4
IFi (t) =
1 dφi (t)
2π dt
(1.5)
Plusieurs solutions existent déjà dans la littérature dans les cas mono-composante
et multi-composantes en présence d’un bruit gaussien [Francos et Friedlander(1995)],
[Francos et Porat(1999)], [Ouldali(1999)], [Davy et al.(2002)]. Étant donné le caractère fortement non-stationnaire, les signaux à phase polynomiale ne peuvent
être traités par des techniques développées sous l’hypothèse stationnaire comme le
périodogramme, la méthode de Prony, Music,...D’autres part les techniques adaptatives basées sur l’hypothèse de stationnarité locale du signal à l’intérieur de la
1.3 Objectifs et Contributions
11
fenêtre d’analyse sont peu efficaces pour étudier ces signaux dont la fréquence instantanée peut évoluer rapidement. Par la suite l’analyse de ces signaux nécessite
une approche qui prend en compte explicitement ce caractère non-stationnaire.
C’est pourquoi l’analyse conjointe en temps et en fréquence a été introduite.
En dépit de l’intérêt que suscitent les signaux FM non-stationnaires en traitement du signal et de la théorie qui s’est développée depuis une trentaine d’années,
il reste bien des problèmes à résoudre surtout en ce qui concerne le cas multicomposante dans un bruit non-gaussien. Notre travail de recherche, dans cette deuxième
partie, s’est alors axé dans deux directions :
• Cas multi-composantes : on ne trouve dans la littérature que peu de méthodes
efficaces d’analyse dans le cas multi-composantes c’est-à-dire constituées de
somme des signaux mono-composante. En effet, les techniques existantes
sont souvent des adaptations ou des extensions des méthodes de traitement
des signaux mono-composante.
• Bruit impulsif : Supposer que le bruit b(t) est de loi Gaussienne dans la majorité des algorithmes d’estimation de la fréquence instantanée qui existent
dans la littérature peut se révéler dramatique pour certaines applications
dans lesquelles le bruit peut être impulsif, ou constitué de sources nuisibles
que l’on ne cherche pas à estimer. Une alternative à ce problème est de
modéliser le bruit par les distributions α-stable permettant de prendre en
compte une structure non-gaussienne du signal bruit [Cappé et al.(2002)].
La difficulté majeure réside dans la détermination de la contribution de chaque
composante au niveau des observations du signal et dans la réduction de l’effet du
bruit impulsif.
1.3
Objectifs et Contributions
L’objectif principal est d’utiliser des théories et techniques existantes et de
développer de nouvelles techniques pour le traitement des signaux de nature nongaussienne (impulsive), et/ou non-stationnaire. Plus précisément, le travail de cette
thèse de doctorat se situe au carrefour des deux grandes problématiques suivantes
dans le contexte d’environnement impulsif (bruit ou signaux sources) :
[A]- Séparation aveugle des mélanges linéaires instantanés des sources
impulsives
Ce problème a été peu étudié pour certains cas statistiquement ardus. En effet,
lorsque les sources sont modélisées par des lois α-stables, les méthodes classiques
ne s’appliquent plus, car la densité de probabilité n’a pas d’expression analytique
explicite et les moments d’ordre supérieur ou égal à 2 sont infinis. Dans ce cas,
nous avons introduit quatres approches originales :
– Une approche basée sur le critère de dispersion minimum qui consiste à
minimiser la somme des dispersions des observations blanchies. L’étape de
12
Introduction
pré-blanchiement des observations est basée sur une nouvelle matrice de
covariance normalisée que nous avons introduite.
– Une deuxième approche basée sur l’idée des statistiques normalisées a été
proposée pour adapter les méthodes existantes basées sur les statistiques
d’ordre deux ou d’ordre supérieur.
– Une troisième approche en utilisant des fonctions de contrastes, sous contrainte
d’orthogonalité, basées sur des fonctionnelles sous- ou sur-additives. En particulier, nous proposons un critère qui consiste à minimiser la somme des
normes Lp des observations après une étape de blanchiment pour séparer
des sources éventuellement à variance infinie.
– Une quatrième approche de structure semi-pramétrique. Dans cette méthode
nous formulons le problème de séparation de source sous forme d’un problème
d’estimation par le principe du maximum de vraisemblance. Par la suite,
nous combinons une version stochastique de l’algorithme2 EM et l’approximation des PDF α-stables par les fonctions log-spline afin d’estimer la PDF
et la matrice du mélange.
[B]- Estimation de signaux FM multicomposantes dans un environnement impulsif
La littérature reste relativement pauvre dans le cas multicomposante et en
particulier dans le cas de bruit impulsif α-stable. Pour contribuer à la résolution
de ce problème, nous avons proposé des méthodes paramétriques et d’autres nonparamétriques basées sur l’analyse temps-fréquence.
– Méthodes paramétriques : Nous commençons par ramener le problème à celui
de l’estimation de signaux harmoniques noyés dans un bruit impulsif grâce
à une transformée polynomiale du signal. Une méthode haute résolution
(MUSIC) est alors appliquée au signal ainsi transformé pour l’estimation des
paramètres. Trois cas de figures sont considérés : (i) Celui de l’application
directe de l’algorithme MUSIC au signal harmonique tronqué en amplitude ;
(ii) celui de l’application de l’algorithme MUSIC à l’estimée robuste de la
fonction de covariance du signal harmonique et (iii) celui de l’application de
MUSIC à la covariation généralisée du signal.
– Méthodes non-paramétriques : Dans une première approche, nous avons appliqué la procédure de robustesse au sens minimax d’Huber contre l’effet du
bruit impulsif sous forme d’une étape de pré-traitement par deux techniques
différentes à savoir : (i) technique de compression des amplitudes par un
filtre non-linéaire de type |x|β ; 0 < β < 1 et (ii) la technique de troncature du signal à partir d’une large valeur. Par la suite nous représentons le
signal dans le plan temps-fréquence en utilisant des transformées quadratiques adéquates au cas multicomposantes et un algorithme de type ad-hoc
pour extraire les composantes et estimer leurs fréquences instantanées. Par
contre dans la deuxième, nous avons combiné l’approche de robustesse Mestimation avec les transformées temps-fréquence quadratiques pour définir
2
Le terme algorithme vient de la prononciation latin du nom de Abu Ja’far Muhammad Ibn Mus Al-Khawarizmi, mathématicien arabe du XI-ème siècle vivant à Bagdad et
précurseur de l’algébre [Khawarizmi(ecle)]
1.4 Organisation du Document
13
des transformées robustes à l’effet du bruit impulsif et des termes croisés
d’un signal multicomposantes.
Finalement, une étude numérique vient compléter les résultats théoriques et permet de comparer nos approches à d’autres méthodes existantes dans la littérature.
Notons aussi que les deux problématiques traitées dans cette thèse sont très
riches et attire de plus en plus les spécialistes du traitement du signal. Nous citons
par exemple une nouvelle contribution qui traite le problème de la séparation
aveugle des mélanges convolutifs des signaux FM [Castella et al.(2004)], ce qui est
une combinaison des deux problèmes abordé dans ce travail.
1.4
Organisation du Document
Conscient que l’aspect abstrait des probabilités et statistiques rebute beaucoup
de traiteur du signal, nous avons voulu presenter un exposé vivant, clair et illustré
par de nombreux exemples, figures et shémas.
Ce document est constitué de la présente introduction, de trois parties illustrant
les différents aspects de nos travaux et d’une conclusion. Nous avons ajouté, en
début de chaque chapitre, une introduction détaillant, plus encore, le contexte et
les enjeux de la partie traité dans le dit chapitre, ainsi que les travaux effectués.
Chaque chapitre se termine par une étude de robustesse des contributions, de leurs
performances et des éventuels prolongements que l’on pourrait envisager de donner
à cette méthode. De plus, les tables de matières accompagnent les trois parties de
la thèse.
Plus précisement, ce rapport de thése est organisé comme suit :
Introduction
• Chapitre 1 : Présente les motivations et l’originalité de ce travail de thèse,
précise le cadre technique des problèmes posés et résume nos contributions principales.
Première partie : Préliminaires
• Chapitre 2–4 : Constituée de trois chapitres, réunissent les notions utiles pour
la suite de la famille α-stables des distributions de probabilités non-Gaussiennes,
d’estimation robuste ainsi que l’outil temps-fréquence pour l’analyse des signaux
non-stationnaires. Le lecteur y trouvera toutes les définitions, théorèmes et formules qu’il doit savoir pour la compréhension du manuscrit.
Deuxième partie : Contributions novatrices en séparation aveugle de
sources impulsives de modèle alpha-stable
• Chapitre 5 : Une présentation générale de la séparation aveugle de sources
ainsi que les grands principes des méthodes existantes sont rappelés. Par
14
Introduction
la suite, nous précisons le problème que l’on aborde : la séparation aveugle
d’un mélange instantané linéaire de sources impulsives modélisées par des
distributions alpha-stables.
•
•
•
•
Ensuite, nous introduisons trois nouvelles approches pour le cas des sources
impulsives de modèle α-stable :
Chapitre 6 : Approches basées sur les moments statistiques fractionnaires
d’ordre inférieure. Nous proposons une fonction de contraste basée sur le
critère du dispersion minimum.
Chapitre 7 : Approche de séparation par des fonctions de contrastes sous
contrainte d’orthogonalité. Nous proposons dans ce chapitre deux classes
de fonctions de contrastes basées sur des fonctionnelles sous- ou suradditives. Des exemples pratiques de fonctions de contrastes sont introduits
pour application aux sources à queue lourde. En particulier, nous proposons
la fonction de contraste qui consiste à minimiser la somme des normes Lp
des observations.
Chapitre 8 : Approche basée sur les statistiques normalisés. Dans ce chapitre nous introduisons des statistiques normalisées dans le but de pouvoir
appliquer correctement les méthodes de séparation de sources basées sur
l’existence des statistiques d’ordre deux et d’ordre supérieur.
Chapitre 9 : Approche semi-paramétrique du principe du maximum de vraisemblance basée sur la combinaison d’une version stochastique de l’algorithme EM et d’une technique d’approximation des densités α-stable par les
fonctions log-splines.
Troisième partie : Contributions novatrices en séparation et éstimation
des signaux FM non-stationnaires afféctés par un bruit impulsif
• Chapitre 10 : Nous commençons cette partie par une présentation générale
des grandes approches paramétriques et non-paramétriques temps-fréquence
existantes dans la littérature.
• Chapitre 11 : Dans ce chapitre, nous présentons trois approches paramétriques
robustes à l’effet du bruit alpha-stable pour l’estimation des signaux FM à
phase polynomiale multi-composantes.
• Chapitre 12 : Dans ce chapitre, nous introduisons deux approches non paramétriques basées sur la représentation temps-fréquence des signaux FM
non-stationnaires considérés dans un environnement impulsif de modèle alphastable.
Conclusion et Perspectives
• Chapitre 13 : A la fin de ce manuscrit une conclusion vient résumer les apports
essentiels du présent travail ainsi que les directions futures de recherche qu’on
envisage.
1.4 Organisation du Document
Fs=1Hz N=7000
15
WHALE SIGNAL
Time−res=120
7000
6000
Time (secs)
5000
4000
3000
2000
1000
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
7
8
9
0.5
(a) Signal de Baleines
Fs=20Hz N=600
b−Distribution
Time−res=5
30
25
Time (seconds)
20
15
10
5
1
2
3
4
5
6
Frequency (Hz)
10
(b) Signal Electroenphalogram
Fs=1Hz N=400
BAT SIGNAL
Time−res=8
400
350
300
Time (secs)
250
200
150
100
50
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
0.5
(c) Signal de Chauve-Souris
Fig. 1.2: Exemples de signaux non-stationnaires.
(a–c) Représentent des signaux d’applications de la vie réelle par la Bdistribution : (a) pour un signal de Baleines, (b) pour un signal electroencephalogram et (c) pour un signal de Chauve-souris.
16
Introduction
Première partie
Outils pour le Traitement des Signaux
non-Gaussiens et/ou non-Stationnaires
L’objectif de cette partie est d’introduire un certain nombre de
concepts en statistiques et en traitement du signal qui ont servi
comme outils de base pour achever le travail de cette thèse, et
qui seront fréquemment utilisés par la suite dans ce document.
Chapitre 2
Distributions à Queues Lourdes
La proprièté de stabilité, le théorème de la limite centrale et la caractérisation
parfaite par les moments d’ordre un (la moyenne) et d’ordre deux (la variance ou
la covariance) sont des propriètés qui font de la loi gaussienne une des lois les plus
utilisées en modélisation statistique. Cependant, bien que les calculs d’inférence
statistique soient simples, l’hypothèse de gaussianité s’avère trop restrictive en particulier dans certains domaines pour lesquels il faut prendre en compte une plus
grande variabilité des données. Dans le cadre des distributions non-gaussiennes à
variance infinie sont apparues les lois α-stables , dont le moment d’ordre 2 est infini
dès que α est strictement inférieur à 2. Ces lois sont utilisées dans de nombreux
domaines tels que les télécommunications [Bestravos et al.(1998)], le traitement
du signal [Nikias et Shao(1995)] et la finance [Bassi et al.(1998)], [Rachev(2003)]
etc... Elles font partie de la classe des lois de probabilités non-gaussiennes à queue
lourde1 qui englobent d’autres modèles existant dans la litérature et qui ont attiré
l’attention de beaucoup de chercheurs en statistique et en traitement du signal. Le
but de ce chapitre n’est pas de faire une description exhaustive des modèles nongaussiens, il s’agit seulement d’introduire ceux qui sont particulièrement adaptés
à la modélisation des phénomènes impulsifs. On présente plus en détail la famille
des distributions α-stables. Le seul fait que les lois stables ont une queue de type
lourde ou bien asymptotiquement parétienne (pour faire référence à la loi de Pareto) ne suffit pas pour justifier leur importance. Il existe deux raisons profondes, la
première provient d’un théorème que nous verrons dans ce chapitre dit théorème
central limite généralisé et qui accorde bien le statut ”lois limite” aux lois αstables. La deuxième raison provient de la proprièté de stabilité qui affirme que
toute combinaison linéaire de v.a.r. α-stables est aussi de loi α-stable.
1
Formellement, une v.a.r. a une queue lourde si elle a une queue algébrique : il existent
c, α > 0 tel que P r(| X |> x) ∼ cx−α , quand x → ∞.
20
Distributions Non-Gaussiennes à Queues Lourdes
Aprés un bref rappel historique sur les lois stables, leurs distributions univariées sont définies et diverses propriétés sont présentées dans un premier temps.
Puis, sont abordés le problème du test d’une variance finie ou infinie ainsi que
l’estimation des deux paramètres caractérisant une loi symétrique α-stable. Dans
une seconde section, le cas multivarié est traité. Certains concepts de mesure de
dépendance des v.a.r. α-stables telles que la covariation, le coefficient de covariation, le coefficient de covariation symétrique et la codifférence sont introduits ainsi
que leurs propriétés. On termine l’étude des lois α-stables par la présentation de
quelques techniques d’approximations analytiques de leur densité de probabilité.
Enfin, on présente d’autres modèles non-gaussiens à queue algébrique largement
utilisés pour la modélisation des signaux impulsifs.
2.1
Bref Historique
Au cours des développements historiques en astronomie au 18-ème siècle, Gauss
a introduit sa méthode d’estimation par le critére du moindre carré et insista sur
l’importance de la loi qui porte actuellement son nom [Gauss(1963)]. Suivi par les
développements de la théorie des séries de Fourier, Laplace et Poisson tentent de
trouver l’expression analytique de la transformée de Fourier (TF) d’une densité
de probabilité (PDF) et lancent alors la théorie des fonctions caractéristiques sur
la bonne voie. Laplace, en particulier, a souligné le fait que la densité de Gauss
et sa TF ont la même expression analytique. Son étudiant Cauchy étend l’analyse de Laplace et
R considèrenla TF d’une fonction de ”Gauss généralisée” de la
1 ∞
forme fn (x) = π 0 exp(−ct ) cos(tx)dt, en replaçant 2 par n. Il n’a pas réussi a
résoudre le problème mais quand il a considéré le cas n = 1, autre que la loi de
Gauss, il a obtenu la fameuse loi de Cauchy f1 (x) = π(c2C+x2 ) . En remplaçant l’entier naturel n par le réel α on obtient la fameuse famille fα des densités α-stables.
Cependant, à l’époque on ne savait pas qu’il s’agit d’une densité de probabilité
et c’est seulement après les travaux de Pólya et Bernstein que la famille fα est
devenu officiellement une classe de PDF pour 0 < α ≤ 2 [Janicki et Weron(1994)].
En 1925, le mathématicien Français Lévy, en étudiant le théorème limite centrale,
confirme que lorsqu’on relâche la condition de variance finie, la loi limite est une loi
stable [Lévy(1925)]. Motivé par ce dernier résultat Lévy établit la TF de toutes les
distributions α-stable, ce qui lui attribue l’originalité de la théorie des lois stables.
Plus tard en 1937, Lévy a introduit une nouvelle approche pour le traitement des
lois stables qui est celle des distributions infiniment divisibles.
D’autres mathématiciens ont contribué plus tard à l’étude approfondie des lois
stables, notablement de Doblin (1939) en utilisant les fonctions à variations regulière, de Gnedenko et Kolmogorov et de [Zolotarev(1966)]. Quelques années plus
tard, [Fama et Roll(1968)] donnent les premières tabulations des lois symétriques
α-stables (SαS), ce qui va permettre de concevoir les premiers estimateurs de
ces lois. Plus tard, les efforts des statisticiens sont focalisés sur l’estimation de
l’exposant caractéristique α qui caractérise la loi et qui détermine si la loi est
de variance finie ou infinie. [Fama et Roll(1971)] ont utilisé les quantiles pour esM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
2.1 Bref Historique
21
timer le paramètre α ce qui permis aux premiers tests d’apparaı̂tre du modèle
i.i.d. α-stable. De nouvelles techniques d’estimation basées sur la fonction caractéristique vont apparaı̂tre dans les années 80 comme par exemple la méthode de
[Koutrouvelis(1980)] qui semble être la meilleure méthode selon plusieurs études
faites par [Akgiray et Lamoureux(1989)] et [Walter(1994)].
Simultanément, des générateurs de variables aléatoires stables sont conçus
par [Chambers et al.(1976)], dont les algorithmes permettent une amélioration
des possibilités de simulation des situations réelles comme par exemple sur les
marchés financiers ou le bruit télephonique. Suivit par les travaux de Paulauskas
dans le cas multivatriés, [Cambanis et Miller(1981)] ont établit la théorie des processus linéaires de lois stables, [Samorodnitsky et Taqqu(1994)] ont développé la
régression linéaire et non-linéaire des distributions α-stables et l’étude des processus stochastiques stables dans [Janicki et Weron(1994)].
Malgrés cette longue histoire de recherche scientifique, les lois α-stables n’attirent
que peu d’attention des chercheurs en sciences appliquées :
– En Astronomie : La première application des distributions α-stables est
apparue avant Lévy dans le domaine de l’astronomie, quand Holtsmark a
montré que la force gravitationnelle exercé par le système stellaire sur un
point de l’univers a une distribution α-stable d’indice α = 3/2.
– En Finance : Si on regarde par exemple les courbes boursières représentant
l’évolution du prix d’un titre au cours du temps, des périodes hautes s’altérnent
à des périodes basses et ainsi de suite. De plus, des fluctuations et des
périodes irrégulières peuvent être observées. Mandelbrot s’appuie alors sur
la loi de Pareto pour mettre en évidence un nouveau modèle de variation
des prix, appelé lois α-stables. [Mandelbrot(1963)] confirme que son modèle
décrit de façon réaliste la variation des prix pratiqués sur certaines bourses
des valeurs. Par la suite, [Fama(1965)] valide le modèle des lois α-stables
sur le prix du marché des actions. A la fin des années 80, plusieurs travaux
semblent rejeter le modèle i.i.d. α-stable en se retournant vers la remise en
question de l’hypothèse d’indépendance ce qui a conduit à la découverte des
lois d’échelle ou lois à longue dépendance.
– En Télécommunications : Les premiers travaux effectués pour l’application
des lois α-stables en traitement du signal ont vu le jour durant les années
70 par trois chercheurs des laboratoires de BELL (Chambers, Mallow et
Stuck) en prouvant que le modèle α-stable est bien adéquat pour modéliser
le bruit des lignes téléphoniques. Ils ont conduit une série de travaux qui
ont abouti a plusieurs résultats de références comme le critère de dispersion
minimum , filtrage de Kalman des processus α-stable et l’analyse de plusieurs algorithmes d’estimation et de détection dans un bruit non-gaussien
[Stuck et Kleiner(1974), Stuck(1977)]. En 1993, Shao et Nikias ont publié
dans IEEE Magazine un article qui a initialisé la méthodologie de traitement du signal dans un environnement α-stable. Plus tard, l’intérêt à ce
thème devient publique et plus de 120 articles de revue et de conférence sont
apparus en plusieurs applications de ce modèle. D’autres applications sont
beaucoup plus récentes, en internet par exemple le temps d’apparition d’une
22
page web est très variable, ce qui rappelle certains modèles à variance infinie.
Dans ce context, [Adler et al.(1998)] donnent divers exemples d’application
des lois à queues lourdes et en particulier les distributions α-stables. Par
ailleurs, en 1999 une conférence internationale sur le sujet ”Applications of
Heavy-Tailed Distributions in Statistics, Engineering and Economics” était
organisée. Quelques mois plus tard, durant la conférence ”IEEE Higher Order Statistics Workshop”, une session spéciale était consacré au sujet. En
2000, la conférence ICASSP aussi consacre une session spéciale au sujet.
Récemment en 2002, un numéro spécial du journal ”Signal Processing” est
dédié aux modèles à queue lourdes et leurs applications en radar, images,
video et en analyse des données télétrafiques (No. 82, 2002).
– D’une Manière Générale : Notons toutefois que même si le modèle i.i.d.
α-stable n’est pas toujours approprié, il représente un bon compromis entre
exactitude de modélisation et compléxité d’inférence statistique. Plusieurs
livres sont consacrés à ces lois : [Zolotarev(1986)] qui a étudié les lois αstables dans le contexte univarié ; [Samorodnitsky et Taqqu(1994)] qui ont
étudié de manière approfondie beaucoup de propriétés de ces lois dans le
cas univarié comme dans le cas multivarié, [Nikias et Shao(1995)] qui ont
appliqué ces lois dans le domaine du traitement du signal et [Nolan(2004)]
pour une étude de point du vue implémantation et modélisation des données.
En dépit de l’intérêt que représente cette famille de distributions, il reste bien
beaucoup de questions à creuser surtout dans le cas multivarié. Notre travail de
recherche s’est alors axé dans le traitement des signaux impulsifs modélisés par
des lois α-stables.
2.2
Lois Stables Univariées
2.2.1
Lois indéfiniment divisibles
Avant de définir les lois α-stables, nous allons introduire une famille de lois
plus générale : les lois indéfiniment divisibles. C’est à partir de ces lois que sera
précisée la forme de la fonction caractéristique des lois stables. L’importance de
telles lois réside dans la solution du problème suivant :
Déterminer la classe des distributions qui s’expriment comme limite d’une somme de n
variables aléatoires réelles (v.a.r.) indépendantes et identiquement distribuées (i.i.d.) ?
Pour résoudre le problème, introduisons alors la définition suivante.
Définition 2.1. Une v.a.r. X a une distribution indéfiniment divisible si et seulement si
d
∀n, ∃X1 , · · · , Xn indépendantes et de même loi telles que X = X1 + · · · + Xn
d
où = signifie l’égalité en distribution.
Il faut noter que les v.a.r. Xi n’ont pas forcèment la même loi que X mais elles
appartiennent à la même classe de distributions.
2.2 Lois Stables Univariées
23
La classe des v.a.r. indéfiniment divisible permet de résoudre le problème cidessus. En effet, on a le théorème suivant.
Théorème 2.1. Une v.a.r. X est la limite d’une somme de v.a.r. i.i.d. si et
seulement si X est indéfiniment divisible.
Pour la démonstration voir [Shiryayev(1984), page 336].
Remarque 2.1. Une des caractérisations des lois indéfiniment divisibles est que
leur fonction caractéristique peut s’exprimer comme puissance n-ème d’une autre
fonction caractéristique.
Théorème 2.2 (Levy-Khinchin). Si X a une distribution indéfiniment divisible,
alors sa fonction caractéristique s’écrit
¾
½
Z +∞ itx
e − 1 − it sin x
M
(dx)
ΦX (t) = exp iµt +
x2
−∞
où µ est un réel et M est une mesure qui attribue une masse finie à tout intervalle
fini et telle que les deux intégrales suivantes
Z +∞
Z −x
+
−2
−
M (x) =
y M (dy) et M (−x) =
y −2 M (dy)
x
−∞
sont convergentes pour tout x > 0.
Pour la démonstration voir [Feller(1971), page 554].
Pour se rapprocher du théorème de la limite centrale et afin d’obtenir une
forme explicite de la fonction caractéristique, nous allons introduire la famille des
distributions α-stable.
2.2.2
Deux définitions équivalentes des distributions α-stables
Définition 2.2 (Propriété de Stabilité). La distribution d’une v.a.r. X est stable
si pour tout suite ak ; k ∈ IN∗ de nombres réels et toute famille X1 , · · · , Xk i.i.d.
de même loi que X, il existe ck > 0 et bk , deux réels, tels que
d
a1 X1 + · · · + ak Xk = ck X + bk
Lorsque bk = 0, on parle de distribution strictement stable.
Théorème 2.3. Pour toute v.a. stable X, il existe une constante α, 0 < α ≤ 2,
telle que la constante ck vérifie :
cαk = aα1 + · · · + aαk
Le nombre α est appelé exposant caractéristique ou bien indice de stabilité.
Dans le cas k = 2, la démonstration est détaillée dans [Samorodnitsky et Taqqu(1994)].
La généralisation au cas k ∈ IN∗ est évidente.
24
Proposition 2.1. Si X est stable alors, X est indéfiniment divisible.
Preuve
X − bn
On considère les v.a. Yj = jan n , j = 1 · · · n
Les v.a. Yj sont indépendantes car les Xj le sont. On peut écrire
Y1 + · · · + Yn =
X1 + · · · + Xn
bn
−
an
an
d
d
comme X1 + · · · + Xn = an X + bn , d’où Y1 + · · · + Yn = X.
¥
Théorème 2.4 (Théorème Central Limite Généralisé). Sans limitation de variance finie, pour toute suite de variables aléatoires i.i.d. X1 , · · · , Xn , suite an de
nombre réels positifs et suite bn nombres réels, la somme normalisée
(X1 + · · · + Xn )
+ bn
an
converge en distribution vers une variable stable.
La démonstration est détaillée dans [Shiryayev(1984), page 338].
On peut également définir les distributions α-stables à partir de leurs fonctions
caractéristiques.
Définition 2.3. : Fonction caractéristique des lois stables (Levy-Khinchin)
Si X a une distribution stable, alors sa fonction caractéristique s’écrit :
Φ(t) = exp{iat − γ | t |α [1 + jβsign(t)ω(t, α)]}
(2.1)
où
½
ω(t, α) =
tan απ
, si α 6= 1
2
2/π log | t | , si α = 1
(2.2)
Une loi stable notée Sα (a, β, γ) est caractérisée par quatres paramètres :
– α : l’exposant caractéristique, 0 < α ≤ 2. Il caractérise les queues de
distribution en mesurant leurs épaisseurs. C’est pourquoi on parle des distributions α-stable à queues lourdes ou à queue épaisse . Quand α est proche
de 2, la probabilité d’observer des valeurs de la variable aléatoire loin de
la position centrale est faible. Une valeur proche de 0 de l’indice α signifie
que la masse de la queue a une probabilité considérable. La valeur α = 2
correspond à la loi normale (loi de Gauss) pour toute valeur de β, alors que
α = 1, β = 0 correspond à la loi de Cauchy ;
– a : paramètre de position. Il mesure la tendance centrale de la distribution. Lorsque α > 1, a représente la moyenne et si 0 < α < 1, alors a
représente la médiane ;
– γ : la dispersion, mesure la dispersion de la distribution autour du paramètre de position a. Lorsque α = 2, la variance existe et γ = 12 V ar(X) ;
25
– β : paramètre de symétrie, −1 ≤ β ≤ 1. Si β = 0, la loi est symétrique
par rapport au paramètre de position a, de fonction caractéristique φα (t) =
exp{iat−γ|t|α }. Dans ce cas la loi de probabilité est dite α-stable symétrique
ou tout simplement SαS. Les distributions α-stable symétrique représente
une sous classe importante des distributions α-stable. Par exemple la loi de
Gauss et la loi de Cauchy sont des lois SαS.
Par convention, une loi α-stable est dite standard si a = 0 et γ = 1. Enfin, reste à
noter aussi qu’il est assez courant dans la littérature de remplacer la dispersion γ
par σ α et d’appeler σ paramètre d’échelle.
Pour donner une comparaison à la loi de Gauss, nous présentons dans la figure
2.1 des réalisations de variables aléatoires i.i.d. symétriques α-stables d’exposants
α = 0.1, 0.5, 0.8, 1, 1.5 et une réalisation gaussienne. On remarque que plus α
est petit, plus la variable est impulsive.
26
6
4
x 10
5
α=0.1
α=0.5
SαS(t)
SαS(t)
4
2
0
−2
0
50
100
150
−500
100
150
200
100
150
200
0
−200
0
50
100
4
α=1.5
150
200
α=2
2
10
0
−10
50
200
SαS(t)
SαS(t)
20
0
α=1
0
50
−5
400
α=0.8
0
0
−10
200
SαS(t)
SαS(t)
500
−1000
x 10
0
−2
0
50
100
Temps t
150
200
−4
0
50
100
Temps t
150
200
Fig. 2.1: Réalisations de signaux α-stables pour différentes valeurs de α.
2.2.3
Stabilité de quelques lois usuelles
Proposition 2.2 (loi de Gauss). La loi Gauss N (m, σ 2 ) est une loi indéfiniment
divisible et α-stable de paramètre α = 2.
26
Preuve
– Indéfiniment divisible : sa fonction caractéristique s’écrit
½
¾
t2 σ 2
Φα (t) = exp imt −
2
¾¸n
·
½
2
m
2σ
2
= exp i t − t
n
n
comme puissance nème de la fonction caractérististique d’une loi
σ2
normale N ( m
n , n ).
– Stabilité : la loi N (m, σ 2 ) est une loi S2 (m, β, σ 2 /2). Réciproquement,
une loi S2 (a, β, γ) est une loi normale N (a, 2γ).
¥
Proposition 2.3 ( Loi de Cauchy). La loi de Cauchy C(a) est une loi indéfiniment
divisible et α-stable de paramètre α = 1.
Preuve
– Indéfiniment divisible : sa fonction caractéristique s’écrit
Φα (t) = exp(−a|t|)
h
a in
= exp(− |t|)
n
comme puissance nème de la fonction caractérististique d’une loi de cauchy
C( na ).
γ
– Stabilité : la loi de Cauchy généralisée de densité f (x) = π1 γ 2 +(x−m)
2 est une
loi S1 (m, 0, γ).
¥
Proposition 2.4 (Loi de Poisson). La loi de Poisson P(λ) est une loi indéfiniment
divisible mais n’est pas stable.
Preuve
– Indéfiniment divisible : la fonction caractéristique de P(λ) s’écrit
1
(1 − itλ )r
!n
Ã
1
=
r
(1 − itλ ) n )
Φα (t) =
comme puissance nème da la fonction
– P(λ) n’est pas stable : nous proposons une demonstration par l’absurde. On
considère deux v.a. X1 et X2 de poisson, s’ils sont stables alors il existe a > 0
d
et b tels que X1 + X2 = aX1 + b.
½
IE(X1 + X2 )
= IE(aX1 + b)
⇒
V ar(X1 + X2 ) = V ar(aX1 + b)
½
⇒
2λ = aλ + b
⇒
2λ = a2 λ
27
½
b = (2 −
a
=
√
2)λ √
2
les v.a. X1 et X2 ne prennent que des
√ valeurs dans
√ IN et donc X1 + X2 aussi,
cela entraı̂ne une contradiction car 2X1 + (2 − 2)λ n’est pas forcèment à
valeurs dans IN.
¥
2.2.4
Propriétés des lois stables
Dans cette partie, les propriétés les plus importantes des lois α-stables seront
présentées.
[A]- Densité de probabilité
Pour les v.a. α-stables, il n’existe pas une expression explicite de la densité de
probabilité (PDF) dans le cas géneral. Cependant on peut obtenir une expression
sous forme d’une intégrale de la PDF à l’aide de la transformée de Fourier inverse
de la fonction caractéristique
f (x; α, β) =
=
Z +∞
1
exp(−itx)Φα (t)dt
2π −∞
Z
1 +∞
exp(−tα ) cos[xt + βtα ω(t, α)]dt
π 0
Quand la distribution représentée par cette densité est symétrique (β = 0)
autour de zéro (a = 0), la fonction caractéristique est une fonction réelle et paire,
ce qui permet de simplifier l’expression de la densité de probabilité
Z
1 +∞
exp(−γ|t|α ) cos(tx)dt
f (x; α, β) =
π 0
Proposition 2.5 (Propriétés de la densité).
1. La densité de probabilité vérifie : f (x; α, β) = f (−x; α, −β).
2. La densité de probabilité d’une distribution α-stable est une fonction bornée.
3. La densité de probabilité d’une distribution α-stable est de classe C ∞ .
Pour la démonstration, voir [Zolotarev(1986)].
La forme explicite de la densité des lois α-stables n’existe que dans les trois
cas importants suivants :
1. La loi de Gauss S2 (a, 0, γ) :
½
¾
1
(x − a)2
√
α = 2, β = 0 =⇒ f (x; 2, 0) =
exp −
4γ
4πγ
28
2. La loi de Cauchy S1 (a, 0, γ) :
α = 1, β = 0 =⇒ f (x; 1, 0) =
π(γ 2
γ
+ (x − a)2 )
3. La loi de Lévy S 1 (a, 1, γ) :
2
½
¾
1
1
1
γ
γ2
α = , β = 1 =⇒ f (x; , 1) = √
−
3 exp
2
2
2(x − a)
2π (x − a) 2
qui est concentrée sur [a, ∞).
0.7
α=0.5
α=1
α=1.5
α=2
0.6
Densité de probabilité p(x)
0.5
0.4
0.3
0.2
0.1
0
−6
−4
−2
0
x
2
4
6
Fig. 2.2: Densité de probabilité α-stables pour différentes valeurs de α.
[B]- Propriétés algébriques
Proposition 2.6. Soit X1 ∼ Sα (a1 , β1 , γ1 ) et X2 ∼ Sα (a2 , β2 , γ2 ) deux v.a. αstables et indépendantes, alors X1 + X2 ∼ Sα (a, β, γ) tels que a = a1 + a2 , β =
β1 γ1 +β2 γ2
et γ = γ1 + γ2 .
γ1 +γ2
Proposition 2.7. Soit X ∼ Sα (a, β, γ) une v.a. α-stable et c une constante réelle,
alors X + c ∼ Sα (a + c, β, γ)
29
Proposition 2.8. Soit X ∼ Sα (a, β, γ) une v.a. α-stable et h une constante réelle
non nulle, alors
hX ∼ Sα (ha, sign(h)β, |h|α γ)
si α 6= 1
1
2
α
hX ∼ S1 (ha − π h ln |h|γ α β, sign(h)β, |h| γ) si α = 1
Pour la démonstration, voir [Samorodnitsky et Taqqu(1994)].
[C]- Comportement queues lourdes
Définition 2.4. La loi de probabilité d’une v.a.r. est dite à queue lourde d’indice
α s’il existe un nombre α ∈]0, 2[ et une fonction h à variation lente, c’est-à-dire
h(bx)
lim
= 1 pour tout b ∈ IR+ tels que :
x→+∞ h(x)
IP(X ≥ x) = x−α h(x)
(2.3)
Proposition 2.9. Soit X une v.a.r. de loi Sα (a, β, γ) avec 0 < α < 2, alors on a
les deux résultats suivants

1+β


lim tα IP(X > t) = Cα
γ,

 t→+∞
2
(2.4)


1
−
β

 lim tα IP(X < −t) = Cα
γ
t→+∞
2
où Cα est une constante qui ne dépend que de α :
µZ
Cα =
∞
−α
x
¶−1 ½
sin xdx
=
0
2/π
1−α
Γ(2−α)cos(πα/2) ,
si α = 1,
si α =
6 1
Pour la démonstration voir [Samorodnitsky et Taqqu(1994), page 16].
D’aprés cette propriété (2.4) par passage à la limite quand x tend vers +∞, on
remarque que les lois α-stables sont asymptotiquement à queue lourde.
Pour une meilleure illustration des densités α-stables, nous avons présenté les
courbes de leurs densités de probabilité et de leurs queues pour différentes valeur
de α dans la figure 2.2 et la figure 2.3 respectivement. Ces figures montrent l’effet
de l’exposant caractéristique α. Nous remarquons que plus α est petit, plus la
densité est impulsive et sa queue est lourde.
[D]- Propriété de mélange
Théorème 2.5. (Théoème de mélange d’échelles)
Soit x ∼ Sαx (0, γx , 0) avec 0 < αx < 2 et soit 0 < αz ³< αx . Soit y une v. a.
´
αx /αz , 0
z
))
totalement ”skewed” de distribution alpha-stable Sαz /αx −1, (cos( πα
2αx
et indépendant de x. Alors
z = y1/αx x ∼ Sαz (0, γx , 0).
(2.5)
30
0.04
α=0.5
α=1.0
α=1.5
α=2.0
0.035
La queue de la densité de probabilité
0.03
0.025
0.02
0.015
0.01
0.005
0
2
2.5
3
3.5
4
x
4.5
5
5.5
6
Fig. 2.3: Les queues de la densité de probabilité α-stable pour différentes
valeurs de α.
Pour la démonstration voir [Samorodnitsky et Taqqu(1994)] et [Feller(1971)].
Ce théorème nous permet d’écrir une v.a. SαS comme produit de deux v.a. αstable dont l’une est totalement ”skewed”.
Corollaire 2.1. Soient x un
de loi normal
³ vecteur
´ N (0, 2γx ) et y une v.a. positive
¢
¡
παz 2/αz
de loi α-stable ; y ∼ Sαz /2 −1, cos( 4 )
, 0 et indépendant de x. Alors
z = y1/2 x ∼ Sαz (0, γx , 0)
(2.6)
Ce cas spécial du théorème 2.5 montre qu’une v.a. SαS peut être représentée
comme produit d’une v.a. gaussienne et d’une v.a. α-stable positive. Cette proprièté montre que les lois SαS sont des distributions gaussiennes conditionnelles
[Papoulis(1991)].
2.2.5
Moments fractionnaires d’ordre inférieur
[A]- Moments fractionnaires d’ordre positif
Même si les moments du second ordre d’une v.a. SαS avec 0 < α < 2 n’existent
pas, les moments d’ordre inférirur à α existent et s’appellent les moments fractionM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
31
naires d’ordre inférieur (FLOM). La proposition suivante donne l’expression des
FLOM en fonction de la dispersion γ et de l’exposant caractéristique α.
Proposition 2.10. Soit X une v.a. Sα (0, β, γ) ; de paramètre de position nul et
de dispersion γ. Alors
– Si α = 2 : ∀p ≥ 0, IE|X|p < +∞
– Si α < 2 :
½
p
C(α, β, p)γ α 0 < p < α,
IE|X|p =
(2.7)
+∞
p ≥ α.
¡
¢p
¡
¢¢
¡
2p−1 Γ(1−p/α)
2α cos p arctan β tan απ
où C(α, β, p) = R +∞
1 + β 2 tan2 απ
2
−p−1
2
α
2
p
0
u
sin udu
et Γ(.) représente la fonction gamma.
Ce résultat important a été démontré par Zolotarev en utilisant la transformée de
Mellin-Stieljes [Zolotarev(1986)]. Dans [Cambanis et Miller(1981)], le même résultat
a été retrouvé en utilisant une proprièté de la fonction caractéristique. Un résultat
similaire est vrai dans le cas des v.a. stables complexes [Masry et Cambanis(1984)].
[B]- Moments fractionnaires d’ordre négatif
Dans [Ma et Nikias(1995a)], les auteurs ont démontré que les v.a.r. SαS ont
aussi des moments finis d’ordre négatif !. Ce résultat surprenant pour les lois αstables symétriques SαS est présenté dans la proposition suivante.
Proposition 2.11. Soit X une v.a.r. SαS de paramètre de position nul et de
dispersion γ. Alors la formule unifiée pour ces moments d’ordre positif et d’ordre
négatif est
p
(2.8)
IE(|X|p ) = C(p, α)γ α pour tout −1 < p < α.
avec C(p, α) = 2p+1
2.2.6
p
)Γ( 1+p
)
Γ(− α
2
√
α πΓ(− p2 )
.
Simulation des lois stables
[A]- Sources de codes
Pour simuler les lois stables, Chambers et al. ont publié le premier programme
en langage FORTRAN dans [Chambers et al.(1976)]. Le même code était amelioré par
Chambers et J. Nolan est publié dans le livre [Samorodnitsky et Taqqu(1994)]. Il
existe aussi une fonction rstab dans la bibliothèque du logiciel S-PLUS. Pour un
programme MATLAB, on peut consulter la page web du professeur John NOLAN.
[B]- Quelques exemples
Nous avons simulé 5000 réalisations de lois SαS pour différentes valeurs de α.
Le tableau suivant (Tableau 2.1) représente la moyenne et la variance empirique
des 5000 réalisations. Ces résultats confirment l’équation sur le calcul des moments.
En effet, lorsque α décroı̂t vers 1, la variance diverge et lorsque α devient plus petit
que 1, c’est la moyenne qui commence à diverger.
32
Valeur de α
IE(X)
V ar(X)
0.5
5324.87
3323423.23
0.9
27.12
3312198.76
1
-0.48
2171.12
1.2
0.01
152.13
1.5
0.03
36.76
1.7
2
0.02 0.02
6.27 2.12
Tab. 2.1 – La moyenne et la variance des lois α-stables pour différentes
valeurs de α
2.3
Inférence Statistique des Lois Stables
La première tâche du traiteur de signal est consacrée à l’étude de la modélisation
des données par des loi de probabilités. En particulier, dans cette section nous
étudierons l’adéquation de la famille des lois α-stable pour cette modélisation
[Nolan(2004)]. Plusieurs niveaux de tests et de validation sont possibles pour
différentes classes de signaux (e.g., images biomédicales, images astronomiques,
signaux EEG, etc). Dans un premier temps, on pourra tester si la distribution des
données est à queue lourde en utilisant l’histogramme de la loi normale. Si l’hypothèse de normalité est violée, on testera si la variance des données est infinie en
utilisant le test de convergence des variances [Adler et al.(1998)]. Ainsi, on pourra
conclure si les données sont dans le domaine d’attraction par l’estimation du coefficient exponentiel α directement à partir des données et en utilisant la méthode
dite “stabilized p-p plots” [Michael(1983)] .
2.3.1
Tests de la variance
Nous allons présenter deux méthodes graphiques pour tester si la distribution
de nos observations est à variance finie ou infinie.
[A]- Test graphique de la convergence de la variance empirique
La stratégie qui semble la plus simple pour tester si la variance est finie ou pas
c’est de faire augmenter la taille de l’échantillon et calculer la variance empirique
correspondant. Plus précisément, on propose l’algorithme résumé dans le tableau
2.2. Si les observations ont une loi à variance finie, lorsqu’on fait augmenter la
taille N des observations, la variance doit converger vers une valeur finie. Dans
le cas contraire, si les observations proviennent d’une loi à variance infinie, un
comportement de divergence doit être observé.
[B]- Test graphique de la queue
L’idée principale de ce deuxième test est basée sur le comportement asympto1+β
tique ”queue lourde” des lois α-stable lim tα IP(X > t) = Cα
γ. Alors, cela
t→+∞
2
implique
d log F̄ (x)
∼ −α; x → +∞
(2.9)
d log x
2.3 Inférence Statistique des Lois Stables
33
'
$
Test graphique de la variance
N
X
1
Step 1. Calcul de la moyenne empirique X̄ =
Xi
N
i=1
Step 2. Calcul de la variance empirique
2
σ̂N
=
1
N
N
X
(Xi − X̄)2
i=1
2
)
(N, σ̂N
Step
3. Visualisation de la courbe
assez grands.
pour des
N
&
%
Tab. 2.2 – Test graphique de la variance en utilisant la variance empirique.
où F̄ (x) = IP(X > x) est le complémentaire de la fonction de répartition F . Nous
résumons l’algorithme dans le tableau 2.3.
'
$
Test graphique log-log de la queue
Step 1. Calcul du logarithme de la queue
4
q(x) = log(
N
1 X
1I|Xi |>x )
N i=1
Step 2. Visualisation de la courbe
assez grands.
(log x, q(x))
pour des
x
&
Tab. 2.3 – Test graphique de la queue d’une distribution par la méthode dite
”log-log”.
Si la variance de la loi de distributions des données est finie la pente de la
courbe doit converger vers une valeur finie [Adler et al.(1998)].
2.3.2
Estimation des paramètres des lois α-stables
La plupart des algorithmes de traitement du signal utilisant des lois α-stables
exigent l’estimation a priori des paramètres de la distribution α-stable et en particulier l’exposant caractéristique α. D’où l’importance d’avoir des techniques efficaces d’estimation des paramètres de la loi. Pour une loi α-stable symétrique
SαS, les paramètres de la distribution à estimer sont l’exposant caractéristique α
et la dispersion γ. De nombreuses méthodes ont été proposées dans la littérature :
maximum de vraisemblance [DuMouchel(1973), Bodenschatz et Nikias(1999)], utilisation des fractiles de la distribution [Fama et Roll(1968)], utilisation de la fonction caractéristique [Koutrouvelis(1980)], utilisation des moments fractionnaires
d’ordre inférieur positifs et négatifs [Ma et Nikias(1995b)], utilisation des moments
logarithmiques de la loi SαS [Ma et Nikias(1995b)], utilisation de la fonction de
%
34
répartion dans [Maymon et al.(2000)] et la généralisation des méthodes existantes
au cas d’une loi α-stable non symétrique [Kuruoglu(2001)]. Dans cette partie, nous
allons discuter la méthode du maximum de vraissemblance et détailler la méthode
basée sur la fonction caractéristique.
[A]- Médode du maximum de vraisemblance
Cette approche largement utilisée en statistique souffre d’une difficulté majeure
dans le cas des distributions α-stable, à savoir le manque d’expression analytique
de la PDF. Malgré cela, [DuMouchel(1973)] a dévoloppé une telle approche dans
ce contexte. D’autres chercheurs ont utilisé des techniques de Monte Carlo ou
des approximations pour approcher les intégrales de l’expression de la densité
[Nolan(2004)]. Cependant, toutes ces méthodes nécessitent une grande complexité
de calcul. De plus, il n’existe aucune étude de convergence de cette approche dans
la littérature.
[B]- Méthode de régression basée sur la fonction caractéristique
Pour une v.a.r SαS, l’expression de la fonction caractéristique est donnée par
ϕX (t) = exp{−γ | t |α }
Ce qui entraı̂ne
| ϕX (t) |2 = exp{−2γ | t |α }
£
¤
log − log | ϕX (t) |2 = log 2γ + α log | t | .
£
¤
On pose yk = log − log | ϕX (t) |2 , λ = log 2γ, ωk = log | tk | ; l’égalité précédente
implique que yk = λ + αωk . Si on pose
¡
¢
ŷk = log − log | ϕ̂X (tk ) |2
où
"
#2 " n
#2 
n


X
X
1
| ϕ̂X (tk ) |2 = 2
cos(tk xi ) +
sin(tk xi )

n 
i=1
i=1
On peut alors proposer comme modèle linéaire suivant
Ŷ = λ + αW + ε
Or la partie imaginaire de la fonction caractéristique est nulle, on a alors l’estimateur de la fonction caractéristique donnée par :
2
| ϕˆX (tk ) | =
µ Pn
i=1 cos(tk xi )
n
¶2
.
(2.10)
En ce qui concerne le choix des tk , ainsi que le choix de K par rapport à n,
on suit la démarche décrite dans [Koutrouvelis(1980)], c’est-à-dire : quelque soit
k ∈ [1, K], tk = πk
25 et le paramètre K est choisi suivant le tableau 2.4 ci-dessous.
2.4 Lois Stables Multivariées
α
n
200
800
1600
35
0.3
0.5 0.7
0.9 1.1
1.3
1.5 1.7
1.9
134
124
118
86
68
56
28
22
18
22
16
14
11
11
11
9
9
10
30
24
20
24
18
15
10
10
10
Tab. 2.4 – Valeurs optimales de K en fonction de n et de α
– Estimation
du paramètre α : par régression linéaire en choisissant les ωk tel
P
que K
ω
= 0, on obtient
k
k=1
PK
α̂ = Pk=1
K
ωk ŷk
(2.11)
2
k=1 ωk
– Estimation de P
la dispersion γ : De même, par régression linéaire et le choix
des ωk tel que K
k=1 ωk = 0, on obtient
1
γ̂ = exp
2
2.4
Ã
K
1 X
ŷk
K
!
(2.12)
k=1
Lois Stables Multivariées
2.4.1
Définition et propriétés
Définition 2.5. Le vecteur aléatoire X = (X1 , · · · , Xd ) est dit α-stable dans IRd
si pour toute suite de nombre positifs a1 , · · · , ak , il existe un nombre positif ck et
un vecteur D(k) ∈ IRd tels que
a1 X(1) + · · · + ak X(k) = ck X + D(k)
d
(2.13)
où X(1) , · · · , X(k) sont des copies indépendantes de X [Samorodnitsky et Taqqu(1994)].
Lorsque D(k) est le vecteur nul, on parle de loi strictement alpha-stable.
Proposition 2.12. Si X est un vecteur α-stable, alors toute combinaison linéaire
des composantes de X est une v.a.r. α-stable.
Preuve P
Soit Y = ni=1 λi Xi une combinaison linéaire des composantes de X.
36
Considérons Y1 , · · · , Y2 des copies de Y .
Y1 + · · · + Yk
d
=
n
X
(1)
λi Xi
+ ··· +
i=1
d
=
n
X
n
X
(k)
λi Xi
i=1
³
´
(1)
(k)
λi Xi + · · · + Xi
i=1
d
=
n
X
´
³
(k)
λi ck Xi + Di
i=1
d
=
ck
n
X
λi Xi +
j=1
d
=
n
X
(k)
λi Di
i=1
ck Y + bk .
¥
Contrairement au cas mono-variable, la fonction caractéristique d’un vecteur stable
multi-variable n’a pas d’expression explicite en t.
Définition 2.6. Une fonction caractéristique d’une v.a. de dimension n est dite
α-stable si elle s’écrit sous la forme
½
T
T
exp(jt
¡ Ta − tR At), T n−1 α
¢ si α = 2;
Φ(t) =
exp jt a − S n−1 |t S
| µ(dS n−1 ) + jβα (t) , si 0 < α < 2.
(2.14)
où
–
R
½
απ
T n−1 |α sign(tT S n−1 )µ(dS n−1 ), si α 6= 1, 0 < α < 2
tan(
)
2
S n−1 |t S
βα = R
T
n−1
log |tT S n−1 |µ(dS n−1 ),
si α = 1.
S n−1 t S
(2.15)
n−1
– S
est la sphère unité de dimension n,
– a, t ∈ IRn ,
– µ(.) est la mesure spectrale de la sphére unité 2 ,
– A est une matrice symétrique, semi-définie positive.
Notons que le cas α = 2 correspond a une distribution gaussienne multivariée
de moyenne a et de matrice de covariance 2A. Notons aussi qu´à l’exception de
ce dernier cas α = 2, les distribution stables multivariées sont déterminés par le
vecteur a ∈ IRn , un scalaire 0 < α < 2 et une mesure finie µ(dS n−1 ) sur la sphère
unité S n−1 .
Définition 2.7. Un vecteur x est dit de distribution α-stable symétrique (SαS)
si x est un vecteur α-stable et si les distributions de −x et x sont identiques.
Théorème 2.6. Soit x un vecteur α-stable, on a les résultats suivants.
2
C’est une mesure sur l’ensemble des boréliens de la sphére unité
2.4 Lois Stables Multivariées
37
1. Si toute combinaison linéaire des composantes de x a une loi symétrique
α-stable, alors x est un vecteur SαS.
2. Si toute combinaison linéaire des composantes de x a une distribution αstable, avec un indice de stabilité α ≥ 1, alors x est un vecteur α-stable.
La démonstration est détaillée dans [Samorodnitsky et Taqqu(1994), page 59].
Proposition 2.13. Soit A une matrice de type m × n et x un vecteur SαS de
dimension n, alors y = Ax est un vecteur SαS de dimension m.
Preuve
D’après le théorème 2.6, il suffit de montrer que toute combinaison
linéaire des composantes de y a une distribution SαS. En effet, soient
b1 , · · · , bm , m réels et x un vecteur SαS, alors nous avons
m
X
bj Yj ∼ SαS ⇐⇒ bt y ∼ SαS
j=1
=⇒ bt Ax ∼ SαS =⇒
Ãm
n
X
X
j=1
!
bi aij
Xj ∼ SαS.
i=1
Or le vecteur x a une distribution SαS, alors la dernière combinaison
ci-dessus a une distribution SαS. Par conséquent, le vecteur y est un
vecteur SαS.
¥
Remarque 2.2. Blanchiment des SαS
Il est montré dans plusieurs ouvrages de traitement statistique du signal qu’on peut
blanchir tout vecteur de distribution Gaussienne. Précisément, si x est un vecteur
Gaussien alors on peut l’écrire sous la forme
x = Ay
où A est une matrice constante et y est un vecteur Gaussien à composantes
indépendantes. Cependant, dans le cas des lois stables, la représentation de deux
variables stables de même exposant caractéristique α, 0 < α < 2, comme combinaison linéaire d’un nombre fini de variables stables indépendantes est impossible en
général [Schilder(1970)]. Ce résultat remarquable nous impose de faire attention
lors de la généralisation de certaines propriètés des lois Gaussiennes au cas des
lois stables au sens de Lévy.
2.4.2
Moments des lois stables multivariées
Le calcul des moments des lois stables multivariées découle de celui des lois
stables univariées.
Théorème 2.7.
38
1. Si X1 , · · · , Xn sont des des v.a.r. α-stables et indépendantes, alors
IE (|X1 |p1 · · · |Xn |pn ) < ∞ si et seulement si pi < α; i = 1, · · · , n
2. Si X1 , · · · , Xn sont des v.a.r. dépendantes et conjointement α-stables, alors
IE (|X1 |p1 · · · |Xn |pn ) < ∞ si et seulement si 0 < p1 + · · · + pn < α
Cette condition est trés faible et souvent réalisée dans la pratique. Pour plus
de détails voir [Miller(1978)].
Dans sa forme générale, une distribution alpha-stable multi-variée reste difficile à
exploiter dans la pratique du traitement du signal. Cependant il existe quelques
sous-classes des distributions α-stables multi-variées avec une expression simplifiée
de la fonction caractéristique. Une telle classe est celle des distributions sousGaussiennes dont la description est presentée ci-dessous.
2.4.3
Vecteur aléatoire α-sous-gaussien
Définition 2.8. La fonction caractéristique des distributions α-sous-Gaussiennes
est donnée par
¶
µ
1 T
def
α/2
Φ(t) = exp − (t Rt)
2
où R est une matrice définie positive. Cette sous-classe est souvent noté par α −
SG(R) [Cambanis et Miller(1981)].
Un vecteur aléatoire de distribution α − SG(R) peut se décomposer comme
produit (ou mélange) d’un vecteur aléatoire α-stable et d’un vecteur gaussien.
Proposition 2.14. Soit x un vecteur α-stable ; x ∈ SG(R), alors
1
x = η2y
avec η une variable aléatoire positive α2 − stable et y un vecteur Gaussien de
moyenne nulle et de covariance R. En plus, η et y sont indépendantes.
Pour la démonstration voir [Cambanis et Miller(1981)].
³
¡ ¢2/α ´
Dans la proposition précédente η ∼ Sα/2 a = 0, β = 1, γ = cos πα
. Alors
4
on peut la voir comme extension du résultat de mélange des SαS.
Comme on peut le voir de la formulation de la définition par la fonction
carctéristique ci-dessus, les paramètres β et γ ne sont plus indépendants. Leurs
valeurs peuvent être déterminées en utilisant la fonction caractéristique et la
mesure spectrale. En effet, contrairement aux distributions alpha-stables monovariables qui forment une classe paramétrique, les distribution stables multi-variées
forment une classe non-paramétrique. Pour plus de détails sur cette classe le lecteur
intéressé peut consulter [Samorodnitsky et Taqqu(1994)].
2.5 Mesure de Dépendance des v.a.r. α-Stables
2.5
39
Mesure de Dépendance des v.a.r. α-Stables
Le coefficient de corrélation est la mesure classique de dépendance (à l’ordre
2) entre deux v.a.r. X1 et X2 de variances finies. Cependant, pour les lois alphastables, les moments d’ordre p avec p ≥ α et 0 < α < 2 sont infinis et en particulier
la variance. Par conséquent, le coefficient de corrélation n’est plus valable en tant
que mesure de dépendance. Dans ce cas, d’autres mesures existent dans la littératue
utilisant les moments fractionnaires d’ordre inférieur à α comme la covariation, la
codifférence et le coefficient de covariation symétrique et d’autres basées sur les
rangs ou sur les densités de probabilités. Dans cette section, nous allons présenter
les plus connues et les plus utilisés pour mettre en évidence certaines particularités
surprenantes des lois stables concernant la structure de dépendance.
2.5.1
Covariation
Nous supposons dans cette partie que 1 < α ≤ 2.
Définition 2.9. Soit (X1 , X2 ) un vecteur SαS avec α strictement supérieur à 1,
la covariation de X1 sur X2 est définie par la quantité
Z
[X1 , X2 ]α =
x1 x<α−1>
dµS 1 (x1 , x2 )
(2.16)
2
S1
où S 1 est la sphère unité, µS 1 est la mesure spectrale et .<.> designe la notation
suivante x<a> = sign(x) | x |a .
Nous présentons une autre définition de la covariation équivalente à la précédente
qui permettra de démontrer facilement plusieurs propriétés.
Proposition 2.15. Soit (X1 , X2 ) un vecteur SαS avec α strictement supérieur à
1, la covariation de X1 sur X2 peut s’écrire
[X1 , X2 ]α =
1 ∂γ(θ1 , θ2 )
|θ1 =0,θ2 =1
α
∂θ1
(2.17)
avec γ(θ1 , θ2 ) est le paramètre de dispersion de la variable aléatoire θ1 X1 + θ2 X2 .
Preuve
La démonstration se fait aisément en se rappelant que
Z
γ(θ1 , θ2 ) =
| θ1 x1 + θ2 x2 |α dµS 1 (x1 , x2 ).
S1
Proposition 2.16.
1. Dans le cas Gaussien (α = 2), la covariation est identique à la moitié de la
covariance.
1
(X, Y ) ∼ SαS =⇒ [X, Y ]2 = Cov(X, Y )
2
40
2. Si X et Y sont deux v.a.r. indépendantes et conjointement SαS alors,
[X, Y ]α = 0.
3. La covariation [X, Y ]α est linéaire en X ou bien linéaire à gauche, c’est-àdire si (X1 , X2 , Y ) est un vecteur SαS alors
[a1 X1 + a2 X2 , Y ]α = a1 [X1 , Y ]α + a2 [X2 , Y ]α
pour toutes constantes réelles a1 et a2 .
4. En général, [X, Y ]α n’est pas linéaire à droite, c’est-à-dire par rapport à Y
mais elle possède la proprièté de pseudo-linéarité suivante : si (X, Y1 , Y2 ) est
un vecteur SαS et que Y1 et Y2 sont indépendantes, alors
[X, b1 Y1 + b2 Y2 ]α = b<α−1>
[X, Y1 ]α + b<α−1>
[X, Y2 ]α
1
2
pour toutes constantes réelles b1 et b2 .
Les démonstrations sont détaillées dans [Samorodnitsky et Taqqu(1994)].
2.5.2
Métrique de covariation
Définition 2.10. Soit X une v.a.r. SαS de dispersion γ et de paramètre de location a = 0. La norme de X est définie par
½
γ
si 0 < α < 1
(2.18)
kXkα =
1
α
si 1 ≤ α ≤ 2
γ
Alors, la norme kXkα est une quantité liée directement à la dispersion γ et
détermine la distribution de X via la fonction caractéristique.
Définition 2.11. Si X et Y sont deux v.a.r. conjointement α-stable, la distance
entre X et Y est définie par
dα (X, Y ) = kX − Y kα
(2.19)
En combinant les deux équations (6.3) et (2.7), on peut facilement remarquer
que la distance dα mesure le p-ème moment de la différence des deux v.a.r. Dans
le cas α = 2, cette distance est identique à la moitié de la variance de la différence
des deux v.a.r. Notons aussi que la convergence en distance dα est équivalente à
la convergence en probabilité [Cambanis et Miller(1981)].
Il est connu dans la théorie des statistique d’ordre deux que l’espace des v.a.r.
d’un processus aléatoire à variance finie est un espace de Hilbert. Cependant, ce
n’est pas le cas pour les v.a.r. α-stables mais il existe un résultat similaire. En
effet, si l’on considère un processus α-stable X(t); t ∈ T , alors l’ensemble des
combinaisons linéaires des variable aléatoires X(t) forment un espace linéaire noté
l(X(t), t ∈ T ). Dans cet espace tous les v.a.r. sont conjointement α-stables de
même exposant caractéristique α [Cambanis et Miller(1981)]. Le théorème suivant
précise la structure de l’espace linéaire des v.a.r. SαS.
41
Théorème 2.8.
– Pour tout 0 < α ≤ 2, la distance dα définit une métrique sur l’espace
l(X(t), t ∈ T ).
– Particulièrement pour 1 ≤ α ≤ 2, k.kα est une norme définie sur l’espace
l(X(t), t ∈ T ).
Preuve
Il faut et il suffit de vérifier les trois axiomes d’une norme.
1. Soit X une v.a.r. SαS, alors
kXkα = 0 ⇐⇒ γX = 0 =⇒ ϕX (t) = 1 =⇒ X = 0p.s.
2. Soit λ un scalaire réel et X une v.a.r. SαS de dispersion γ.
D’aprés la proposition 2.8, la dispersion de λX est |λ|α γ. On a
donc
1
1
kλXkα = (|λ|α γ) α = |λ|γ α = |λ|kXkα
3. Si X1 et X2 sont deux v.a.r. conjointement SαS de mesure spectrale µ, alors
1
kX1 + X2 kα = γXα 1 +X2
µZ
¶1
α
α
=
|x1 + x2 | µ(dS)
S
µZ
¶1
α
≤
S
|x1 | µ(dS)
1
α
α
µZ
¶1
α
|x2 | µ(dS)
α
+
S
1
α
= γX1 + γX2
= kX1 kα + kX2 kα .
Alors k.kα définit bien une norme sur l’espace vectoriel des vecteurs
SαS.
¥
La difficulté fondamentale en traitement des signaux alpha-stables par les statistiques fractionnaires d’ordre inférieur est que la théorie des espaces d’Hilbert
n’est pas valide dans ce cas : l’espace linéaire des processus alpha-stables est
un espace de Banach pour 1 ≤ α ≤ 2 mais seulement un espace métrique pour
0 < α < 1.
2.5.3
Coefficient de covariation
Dans cette partie, (X, Y ) est un vecteur SαS avec α > 1.
Définition 2.12. Le coefficient de covariation de X sur Y est défini par
def
λX,Y =
[X, Y ]α
[Y, Y ]α
(2.20)
où [X, Y ]α est la covariation entre X et Y .
42
Ces définitions de covariation et de coefficient de covariation ne sont pas très faciles à utiliser en pratique puisqu’elles utilisent la mesure spectrale. Heureusement,
on peut connecter la covariation et le coefficient de covariation avec les moments
fractionnaires d’ordre strictement inférieur à α.
Théorème 2.9. Soient X et Y deux v.a.r. conjointement SαS avec 1 < α ≤ 2.
Notons la dispersion de Y par γY , alors
– Covariation :
[X, Y ]α =
IE(XY <p−1> )
γY ; 1 ≤ p < α
IE(| Y |p )
(2.21)
– Coefficient de covariation :
λX,Y =
IE(XY <p−1> )
; 1≤p<α
IE(| Y |p )
(2.22)
Les moments fractionnaires d’ordre inférieur dépendent de la loi α-stable qui
dépend directement de α. Cela implique que la covariation et le coefficient de
covariation dépendent de α.
Proposition 2.17. Soit X une v.a.r. α-stable d’exposant caractéristique α et de
dispersion γ. Alors la dispersion de X peut être exprimé sous la forme
Z
γX =
|x|α dµS 1 (x) = [X, X]α
(2.23)
S1
Preuve
Il suffit de combiner les deux équations (2.16) et (2.21).
¥
Proposition 2.18.
1. Soit (X, Y ) un vecteur SαS, alors on a
a
λaX,bY = λX,Y
b
pour tout couple (a, b) ∈ IR × IR∗
2. Soit (X, Y, Z) un vecteur SαS, alors on a
λX+Y,Z = λX,Z + λY,Z
3. Le coefficient de covariation entre X et Y n’est pas symétrique et n’est pas
borné.
Preuve
1. D’aprés la définition du coefficient de covariation, on a
λaX,bY
=
=
[aX, bY ]α
ab<α−1> [X, Y ]α
=
[bY, bY ]α
| b |α [Y, Y ]α
a [X, Y ]α
a
= λX,Y .
b [Y, Y ]α
b
43
2. D’aprés la linéarité à gauche de λX,Y on peut écrire
λX+Y,Z
[X + Y, Z]α
[X, Z]α + [Y, Z]α
=
[Z, Z]α
[Z, Z]α
[X, Z]α
[Y, Z]α
=
+
[Z, Z]α
[Z, Z]α
= λX,Z + λY,Z .
=
3. Il suffit de prendre X = cY ; avec c 6= ±1 et voir que λX,Y = c
et que λY,X = 1/c. Alors, λX,Y 6= λY,X .
Par le même exemple on peut conclure que le coefficient de covariation λX,Y = c n’est pas borné.
¥
2.5.4
Codifférence
Nous supposons dans cette partie que 0 < α ≤ 2.
Comme le coefficient de covariation, la codifférence est une autre quantité qui
permet de mesurer la dépendance entre deux v.a.r. SαS.
Définition 2.13. La codifférence entre X et Y est définie par
τX,Y = kXkαα + kY kαα − kX − Y kαα
(2.24)
où k.kα est la norme de covariation introduite précédemment.
Proposition 2.19.
1. La codifférence est symétrique : τX,Y = τY,X .
2. Si α = 2, comme le coefficient de covariation, la codifférence est liée à la
covariance :
τX,Y = Cov(X, Y )
Preuve
1. Pour montrer la symétrie de la codifférence, il suffit de montrer
que
kX − Y kαα = kY − Xkαα .
Or k.kαα est une norme et donc pour toute v.a.r. X SαS, on a
kXkαα = k − Xkαα ce qui achève la preuve.
2. On a vu que [X, Y ]2 = 12 Cov(X, Y ) et que kXkαα = [X, X]α . Ce
qui entraı̂ne que
1
1
kXk22 = Cov(X, X) = V ar(X)
2
2
et donc
1
1
1
τX,Y = V ar(X) + V ar(Y ) − V ar(X − Y )
2
2
2
or V ar(X − Y ) = V ar(X) + V ar(Y ) − 2Cov(X, Y ), ce qui donne
le résultat souhaité soit τX,Y = Cov(X, Y ).
¥
44
2.5.5
Coefficient de covariation symétrique
Définition 2.14. (Garel et al., 2004) Soit (X, Y ) un couple aléatoire réel SαS.
Le coefficient de covariation symétrique entre X et Y est définit par
Corrα (X, Y ) = λX,Y λY,X =
[X, Y ]α [Y, X]α
[X, X]α [Y, Y ]α
On obtient alors le résultat suivant qui corrige les inconvénients du coefficient
de covariation.
Proposition 2.20. Soit (X, Y ) un couple aléatoire réel SαS. Nous avons les
propriétés suivantes
1. Corrα (X, Y ) = Corrα (Y, X) et |Corrα (X, Y )| ≤ 1.
2. Si X et Y sont deux v.a.r. SαS indépendantes, alors Corrα (X, Y ) = 0.
2.5.6
Estimation des coefficients de covariation
Proposition 2.21. (Samorodnitsky et Taqqu, 1994 ; d’Estampes, 2003)
Soit (X, Y ) un couple aléatoire réel SαS où α > 1, nous avons pour tout 1 ≤ p < α,
λX,Y =
[X, Y ]α
IEXY <p−1>
=
[Y, Y ]α
IE|Y |p
. Soit (X1 , ..., Xn ) (resp. (Y1 , ..., Yn )) un n-échantillon de même loi que X (resp.
Y ). En prenant p = 1 dans
on peut construire un estimateur
Pl’équation précédente,
P
de λX,Y à savoir λ̂X,Y = ni=1 Xi sign(Yi )/ ni=1 |Yi | . Pour estimer Corrα (X, Y ),
nous utilisons alors la quantité suivante
¶ µ Pn
¶
µ Pn
Y
sign(X
)
X
sign(Y
)
i
i
i
i
i=1
i=1
ˆ α (X, Y ) =
Pn
Pn
Corr
i=1 |Yi |
i=1 |Xi |
qui est le produit de l’estimateur du coefficient de covariation λX,Y par l’estimateur
du coefficient de covariation λY,X .
2.6
2.6.1
Représentation Analytique des PDF α-Stables
Développement en séries entières
A l’exception des trois lois particulières, lois de Gauss, loi de Cauchy et la loi
de Lévy, la PDF des distributions α-stables n’a pas d’expression analytique exacte.
Cependant, il existe un développement en série entière de celle ci. Par exemple,
le développement en série entière de la PDF d’une distribution α-stable standard
SαS, est donné par [Samorodnitsky et Taqqu(1994)] :

∞
X

(−1)k−1
sin(kαπ/2)

1

Γ(αk + 1)
si 0 < α < 1

 π
k!
x | x |αk
k=1
f (x)α =
(2.25)
∞
X

(−1)k 2k + 1 2k

1

Γ(
)x ,
si 1 ≤ α ≤ 2

πα

2k!
α
k=0
2.6 Représentation Analytique des PDF α-Stables
45
Vu que ces sommes regroupent un nombre infini de termes, il est difficile de les
utiliser dans la pratique.
2.6.2
Développement asymptotique
Pour les distributions SαS avec α > 1, il existe un développement asymptotique de la densité de probabilité proposé dans [Bergstrom(1952)].
n
f (x)α =
1 X (−1)k 2k + 1 2k
Γ(
)x + O(| x |2n+1 ) quand | x |→ 0
πα
2k!
α
(2.26)
k=0
et
n
X
1 (−1)k
sin(kαπ/2)
Γ(αk +1)
+O(| x |−α(n+1)−1 ), quand | x |→ ∞
π k!
| x |αk+1
k=1
(2.27)
Le calcul de la série asymptotique pour des larges valeurs de n pose des problèmes
de calcul au niveau de la fonction gamma. Ces difficultés peuvent être réduites en
suivant la procédure proposée dans [Nikias et Shao(1995), page 17].
f (x)α =
2.6.3
−
Approximation par un mélange fini
[A]- Approximation par un mélange fini de gaussiennes
Dans cette section, nous considérons une v.a.r. gaussienne X, une v.a.r. Y α1
stable et la v.a. Z = Y 2 X de loi α-stable selon le corollaire 2.1 et on présente
la méthode d’approximation des PDF SαS par un mélange de gaussiennes introduite par [Kuruoglu(1998)]. On peut déduire l’expression de la densité de Z par
la propriété de marginalization des PDF
Z
fZ (z) =
+∞
fZ|V (z | v)fV (v)J(z, v)dv
−∞
(2.28)
1
où fZ (.) et fV (.) représentent les densités de Z et de V = Y 2 respectivement et
J(z, v) représente le Jacobien de Z par rapport à V . Or X est une v.a.r. gaussienne,
alors pour une réalisation V = v, fZ|V est conditionnellement distribuée selon la
loi Gaussienne. On peut alors réexprimer l’équation (2.28) sous la forme
1
fZ (z) = √
2π
Z
+∞
exp(−
−∞
z2
)fV (v)v −1 dv
2γv 2
(2.29)
Cette densité est appelé mélange d’échelles de la loi normale et la fonction h(v) =
fV (v) est dite fonction de mélange. La fonction de mélange est la densité du v.a.r.
1
V = Y 2 dont l’expression est obtenue grâce à la formule suivante
46
Théorème 2.10. Soit V = T (Y ), où T représente une transformation inversible,
alors
fV (v) = fY (T −1 (v)) |
ou simplement = fY (y) |
dT −1 (v)
|
dv
dy
|
dv
(2.30)
(2.31)
Pour le cas spécial que nous avons considéré ici, cette relation est réduite à
fV (v) = fY (y)2v.
(2.32)
Notons que la décomposition en mélange d’échelle gaussiennes est une propriété
bien étudiée dans la littérature [Andrews(1974)].
L’échantillonnage de fZ (z) de l’équation (2.28) sur un ensemble fini de N
points permet d’obtenir une approximation de la PDF SαS par un mélange fini
de densités gaussiennes :
PN
f(α,a,0,γ) (z) ≈
2
exp(− (z−a)
)fV (vj )
2γvj2
P
√
2πγ N
j=1 fV (vj )
1
j=1 vj
(2.33)
Pour une bonne approximation, on doit prendre N assez grand ce qui va rendre
le calcul assez complexe. Pour réduire cette complexité, [Kuruoglu(1998)] propose
d’utiliser un certain nombre de composantes et puis de raffiner l’approximation
en utilisant l’algorithme EM [Dempster(1977)]. Cette procédure permet alors d’estimer la densité SαS, nous résumons les étapes essentielles dans le tableau 2.5
suivant.
[B]- Approximation par un mélange fini de Pearson
Pour le cas d’une densité α-stable de paramètres β = +1 et α < 1, une approximation par un mélange fini de densités de Pearson qui sont des PDF α-stables
d’indice α = 1/2 été proposé récemment dans [Kuruoglu(2003)]. Notons que l’auteur suit la même démarche pour le cas de mélange de gaussiennes ci-dessus en
prenant αx = 1/2 au lieu de αx = 2 dans le théorème des mélange d’échelles des
lois α-stables (théorème 2.5).
2.7
Autres Distributions à Queues Lourdes
Dans cette section, nous introduisons d’autres classes de distributions à queues
lourdes. La première est celle des lois gaussiennes généralisées (GG) et la deuxième
classe est celle des lois appeleées lois normales inverse gaussiennes (NIG).
2.7.1
Loi gaussienne généralisée
Une généralisation des lois de Gauss et de Laplace est donnée par le modèle
des lois gaussiennes généralisées. La distribution de ce modèle est décrite par une
2.7 Autres Distributions à Queues Lourdes
47
$
'
Estimation de la PDF SαS
Step 1. Initialisation : Étant donné les paramètres de la PDF SαS
désirée, on génère la fonction
caractéristique ϕY (.) d’une´ v.a.r. Y stable,
³
2
)) α
positive de paramètres α/2, β = −1, a = 0, γ = (cos( πα
4
Step 2. Evaluation de la PDF stable positive fY en N points : En
appliquant la FFTa à la fonction caractéristique ϕY (.) générée dans l’étape
précédente, où N représente le nombre de composantes gaussiennes dans le
mélange
Step 3. Evaluation de la fonction de mélange fV : c’est la densité de
1
la v.a.r. V = Y 2 donnée par
fV (v) = 2vfY (v 2 )
(2.34)
Step 4. L’approximation analytique de la PDF SαS : Par substitution
de l’équation (2.34) dans l’équation (2.33) :
PN
f(α,a,0,γ) (z) =
2
)fY (vj2 )
exp(− (z−a)
2γvj2
P
√
2
2πγ N
j=1 vj fY (vj )
j=1
(2.35)
Step 5. Affinage de l’approximation par l’algorithme EM : Nous
cherchons à estimer un mélange de gaussiennes de la forme
f(α,a,0,γ) (z) =
N
X
pj G(z/j)
(2.36)
j=1
P
où pj sont les fréquences de pondération tel que N
i=1 pj = 1 et G(z/j) sont
des PDF gausiennes. On considère M observations (zm , m = 1, · · · , M ) comme
variables cachées et on applique l’algorithme EM qui consiste à initialiser l’algorithme par une première estimation de G(zm /j) et pj et puis alterner les
deux étapes ”Expectation” et ”Maximisation” [Dempster(1977)].
a
La transformée de Fourier rapide
&
%
Tab. 2.5 – Approximation de la PDF SαS par le modèle de mélange de
gaussiennes et affinage de l’approximation par l’algorithme EM.
densité de type exponentielle de la forme :
fp (x) = c exp(− |
x α
| )
σ
(2.37)
48
où c =
α
1
2σΓ( α
)
et Γ(.) est la fonction gamma. Le paramètre σ > 0 représente le
paramètre d’échelle de la distribution et α > 0 est le paramètre qui caractérise
l’impulsivité. Notons que pour α = 2, fα (x) est gaussienne, alors que α = 1 correspond à la loi de Laplace. Conceptuellement, plus α est petit, plus la distribution
est impulsive.
Cette classe de PDF est utilisée depuis longtemps. Les références les plus anciennes à ma connaissance sont [Subbotin(1923)] et [Frechet(1924)]. En raison de
leur simplicité dans les calculs mathématiques elle sont largement exploitées dans
les applications du traitement du signal [Kay(1998a)] pour modéliser plusieurs processus, qui sont observés dans des domaines variés dont le traitement de la parole,
l’audio ou le signal vidéo, l’image, la turbulence et les systèmes multi-utilisateurs
[Zoubir et Brcich(2002)].
Notons que les moments de ce type de v.a. sont finis et calculables analytiquement, par opposition à d’autres PDF à queues lourdes comme les α-stables
presentées au début de ce chapitre.
Proposition 2.22. L’expression analytique des moments d’ordre k est donnée
par :
½
0
si k est impair,
k
(2.38)
IE(X ) =
2c
k+1
)
si
k est pair.
Γ(
α
ασ −k−1
Preuve
– Si k est impair, IE(X k ) = 0, car la fonction xk exp(− |
impaire.
– Si k est pair, on a :
Z
k
+∞
x
σ
|α ) est
x α
| )dx
σ
−∞
Z +∞
x
= 2
cxk exp(−( )α )dx
σ
0
Z +∞
k+1
2c
=
y α −1 exp(−y)dy
−k−1
ασ
0
2c
k+1
=
Γ(
).
ασ −k−1
α
IE(X ) =
cxk exp(− |
¥
dans la proposition suivante, nous présentons le comportement de la loi gaussienne
généralisée pour différentes valeurs de α.
Proposition 2.23. (Comportement de la loi GG pour différentes α)
– Si α = 2, la loi gaussienne généralisée correspond à la loi de Gauss standard.
– Si α > 2, la queue de la loi gaussienne généralisée est moins lourde que
celle de la loi de Gauss standard, c’est-à-dire que la PDF tend vers 0 plus
rapidement que la PDF de Gauss.
49
– Si α tend vers +∞, la loi gaussienne généralisée converge vers la loi uniforme.
– Si 0 < α < 2, la queue de la loi gaussienne généralisée est de nature impulsive.
Malgré le succés relatif de la famille des lois gaussienne généralisées, elle présente
quelques limitations. En effet, lorsque α < 1 la ”peaky shape” de la distribution
est non approprié à certaines situations pratiques du bruit. De plus, la minimisation de la fonction de coût basée sur la norme Lp reste un problème majeur.
On peut noter également la décroissance exponentielle de la queue contrairement
au comportement algébrique de la queue des processus impulsifs rencontrés dans
plusieurs applications [Nikias et Shao(1995)].
2.7.2
Loi normale inverse gaussienne
La famille des lois normales inverses gaussiennes (NIG) est une sous classe universelle des distributions hyperboliques généralisées. Le travail pionnier sur les lois
N IG est introduit par Barndorff-Nielsen en 1977 et en 1995. D’autres réferences
récentes existent dans la littérature comme par exemple [Barndorff(1998)]. Contrairement au lois SαS, la densité des distributions N IG a une expression explicite.
[A]- Définition
Définition 2.15. Une v.a.r. X est de loi N IG si sa densité de probabilité est de
le forme
√
p
K
(α
αδ
δ 2 + x2 )
1
exp(δ α2 − β 2 − βµ) √
exp(βx)
fX (x) =
π
δ 2 + x2
(2.39)
où
µ, δ > 0,
0 ≤| β |≤ α ∈ IR et K1 est la fonction de Bessel modifiée de
seconde espèce d’indice 1.
Une v.a.r. X de loi N IG est notée X ∼ N IG(α, β, δ, µ). Une loi N IG est
paramétrée par quatres paramètres α, β, δ et µ et ces paramètres ont la même
interprétation que ceux des lois α-stables : α détermine le comportement de la
queue, plus α est petit plus la queue est lourde ; β est un paramètre de symétrie,
β = 0 donne une densité symétrique, β > 0 implique que la densité est étalée
vers la droite, β < 0 implique que la densité est étalée vers la gauche ; δ est un
paramètre d’échelle et µ est un paramètre de position. Pour illustrer l’allure des
lois N IG nous avons tracé la PDF pour plusieurs valeurs de α dans la figure 2.4.
Définition 2.16. La fonction caractéristique d’une v.a.r. N IG est donnée par
[Barndorff-Nielsen(1997)].
n p
o
p
ϕX (t) = exp δ α2 − β 2 − δ α2 − (β + jt)2 + jµt
(2.40)
50
0.7
α=2
α=1
α=0.5
α=0.001
0.6
NIG(α,0,1,0)(x)
0.5
0.4
0.3
0.2
0.1
0
−5
−4
−3
−2
−1
0
x
1
2
3
4
5
Fig. 2.4: La densité de probabilité de la loi N IG(α, 0, 1, 0) pour différent
valeurs de α.
[B]- Propriétés
– Indéfiniment divisible : D’aprés la forme exponentielle de la fonction caractéristique des lois NIG, on peut l’exprimer facilement sous forme d’une
puissance d’une autre fonction caractéristique d’une loi N IG, par conséquent
les lois N IG sont des lois indéfiniment divisible. Cette proprièté signifie que
si X1 , · · · , XN sont des
v.a.r. indépendantes et si Xi ∼ N IG(α, β, δi , µi ),
PN
alors la somme S =
est aussi de loi N IG. En plus nous avons
i=1 Xi P
PN
S ∼ N IG(α, β, δ, µ) avec δ = N
i=1 δi et µ =
i=1 µi . Cette propriété est
similaire à la proprièté qui caractérise la classe des α-stable. Cependant, la
distribution N IG n’est pas stable. Une façon de le voir, est que le paramètre
δ diverge pour une somme infinie normalisée de v.a.r. N IG.
– Contient les lois de Gauss et de Cauchy : D’après la forme de la fonction
caractéristique on peut remarquer facilement que les lois de Cauchy et de
Gauss apparaı̂tront comme des cas spéciaux des lois N IG. En effet, la loi de
Gauss représente le cas limite β = 0, α → ∞ et σ 2 = δ/α ; et N IG(0, 0, δ, µ)
correspond à la loi de Cauchy.
– Comportement asymptotique de la queue : Dans [Hanssen et Oigard(2001)],
l’auteur a démontré que le comportement asymptotique de la PDF N IG est
51
donné par
½
lim f (x) ∝
|x|→∞
| x |−3/2 exp(βx − α | x |) , α 6= 0
| x |−2
, α → 0.
(2.41)
On voit alors que pour α 6= 0, le comportement asymptotique des PDF N IG
combine une décroissance algébrique et exponentielle dont le terme exponentiel est
déterminé par les deux paramètres α et β. Quand α → 0, la PDF N IG approche de
celle de Cauchy et donc le comportement asymptotique de la queue aussi approche
de la queue de Cauchy.
2.7.3
Loi t-Student
Définition 2.17. ( William Sealy Gosset 1908)
Introduite par Gosset en 1908, la PDF d’une distribution t-Student est paramétrisée
sous la forme
µ
¶− α+1
2
x2
Tα = c 1 +
(2.42)
α
où
1 Γ( α+1 )
c= √ √ 2α
α πΓ( 2 )
Il s’agit d’une densité symétrique par rapport à l’axe des ordonnées.
Définition 2.18. (Roland Aylmer Fisher 1925)
Fisher s’interessa aux travaux de Gosset. Il lui écrivait en 1912 pour lui proposer
une démonstration géométrique de la loi de Student, et pour introduire la notion
de degré de liberté. Il publia notamment un article en 1925 dans lequel il définit la
loi de student comme rapport de deux v.a.r. indépendantes U et Y suivant respectivement une loi N (0, 1) et une loi χ2 (α) :
def U
Tα = q
Y
α
=
√ U
α√
Y
(2.43)
On dit que le quotient Tα suit une loi de t-Student (ou tout simplement : loi de
Student)3 à α degré de liberté.
Proposition 2.24. (Propriétés de la loi de Student)
– L’espérance : De l’expression de la PDF ci-dessus, on peut déduire qu’une
v.a.r. de loi t-Student est centrée et de moyenne nulle.
– La variance : Lorsque α ≤ 2, la loi de Student n’admet pas de variance finie.
α
Si α > 2, le calcul de la variance donne α−2
.
3
Student était le pseudonyme choisi par le statisticien William Sealy Gosset (18761937). Il fut l’un des premiers statisticiens du monde de l’entreprise, consacrant sa carière à
l’industrie agro-alimentaire, au sein de laquelle il a toujours été reconnu à la fois comme industriel et comme scientifique. Très associé au monde universitaire, il a largement contribué
au développement scientifique de cette période.
52
– Queue algébrique : Il est facile de constater, d’après la forme de la PDF,
que la loi de Student est de queue algébrique d’indice α. Plus α devient petit,
plus la queue devient lourde.
– Cas extrêmes :
1. Lorsque α → ∞, la distribution de Student est équivalente à la distribution de Gauss.
2. Lorsque α → 0, la distribution devient très impulsive.
– Cas particulier : Lorsque α = 1, le modèle correspond à celui de Cauchy.
La famille des lois t-Student était introduite en traitement du signal pour la
première fois par Hall en 1966 comme modèle empirique pour le bruit atmosphérique
en communication radio [Hall(1966)]. Néanmoins, on trouve bien avant ce modèle
dans la littérature des statistiques mathématiques indexé par un entier k au lieu
du réel α. Il se peut que Hall l’avait généralisé en remplaçant k par α.
2.8
Conclusion
Malgré le rôle important des lois α-stables dans la modélisation des signaux à
densité de probabilité à queue lourde, elles présentent des limites : préférons dire,
elles ouvrent plusieurs questions pour surmonter les difficultés rencontrées lors de
l’inférence statistique en l’absence d’une expression explicite de la densité et en
l’absence des moments de second ordre et d’ordre supérieur.
Pour contribuer à la résolution de certaines questions relatives à la séparation
de sources impulsives de distributions α-stables et d’estimation d’un signal noyé
dans un bruit impulsif de modèle α-stable, nous allons proposer dans les chapitres
suivants de nouvelles approches.
En effet, nous allons utiliser les moments d’ordre inférieur, introduire des statistiques normalisées et approcher la densité de probabilité par la famille des fonctions
log-splines pour pouvoir manipuler les observations de lois α-stables.
Chapitre 3
Robust Estimation
The robust minimax approach is an alternative to the conventional maximum
likelihood (ML) that overcomes the ML estimate sensitivity and improves the efficiency in an environment with unknown heavy-tailed distribution [Huber(1981)].
In this chapter, we provide a brief background on the fundamental concepts of
robust estimation that will be used in the subsequent 2-nd part of this thesis.
3.1
Robustness
The term ”robust” was coined in statistics by G.E.P. Box in 1953. Various
definitions of greater or lesser mathematical rigor are possible for the term. However, in general, referring to a statistical estimator, it means ”insensitive to small
departures from the idealized assumptions for which the estimator is optimized”
[Hampel et al.(1986), Huber(1972)]. The word ”small” can have two different interpretations, both important : either fractionally small departures for all data
points, or else fractionally large departures for a small number of data points. It is
the latter interpretation, leading to the notion of outlier points, that is generally
the most stressful for statistical procedures.
Roughly speaking, robustness means insensitivity to gross measurement errors,
and errors in the specification of parametric models. For example, consider the estimation of the mean from 100 measurements. Assume that all measurements (but
one) are distributed between -1 and 1, while one of the measurements has the
value 1000. Using the simple estimator of the mean given by the sample average,
the estimator gives a value that is not far from the value 10. Thus, the single,
probably erroneous, measurement of 1000 had a very strong influence on the estimator. The problem here that the average corresponds to minimization of the
54
Robust Estimation
squared distance of measurements from the estimate. The square function implies
that measurements far away dominate.
To get a good estimator in presence of outliers, statisticians have developed
various sorts of robust statistical estimators. Many, if not most, can be grouped in
one of three categories.
• M-estimates : follow from maximum-likelihood arguments which are the
usually the most relevant class for model-fitting, that is, estimation of parameters. We therefore consider these estimates in some detail below.
• L-estimates : are linear combinations of order statistics. These are most
applicable to estimation of central value and central tendency. Two typical
L-estimates will give the general idea. They are (i) the median, and (ii) Tukey’s trimean, defined as the weighted average of the first, second, and third
quartile points in a distribution, with weights 1/4, 1/2, and 1/4, respectively.
• R-estimates : are estimates based on rank tests. For example, the equality
or inequality of two distributions can be estimated by the Wilcoxon test
of computing the mean rank of one distribution in a combined sample of
both distributions. The Kolmogorov-Smirnov statistic and the Spearman
rank-order correlation coefficient are R-estimates in essence, if not always
by formal definition [Huber(1981)].
Some other kinds of robust techniques, can be found in the field of optimal control
and filtering rather than from mathematical statistics literature.
3.2
M- Estimation
Huber (1964) proposes a generalization of the least squares principle for constructing estimators of (principally) location parameters. Suppose, on the basic model,
that the sample comes from a distribution with distribution function F (x − θ). It
is the location parameter, θ, which we wish to estimate. We might estimate θ by
Tn = Tn (x1 , x2 , . . . , xn ) chosen to minimize
n
X
ρ(xj − Tn )
(3.1)
j=1
where ρ is some real valued non-constant function. As special cases we note
that ρ(t) = t2 yields the sample mean, ρ(t) = |t| yields the sample median, whilst
ρ(t) = − log f (t) yields the maximum likelihood estimator (where f (x) is the
density function under the basic model when θ = 0). If ρ is continuous with
derivative ψ, equivalently we estimate θ by Tn satisfying
n
X
ψ(xj − Tn ) = 0.
(3.2)
j=1
Such an estimator is called a maximum likelihood type estimator, or M-estimator.
If ρ is convex, then ( 3.1) and (3.2) are equivalent ; otherwise, (3.2) is still very
3.2 M- Estimation
55
useful in searching for the solution to (3.1). Usually we restrict attention to convex
ρ, so that ψ is monotone and Tn unique. Under quite general conditions Tn can
be shown to have desirable properties as an estimator. If ρ is convex Tn is unique,
translation invariant, consistent, and asymptotically normal [Huber(1972)]. The
choice of ρ that leads to an optimal robust estimator of θ is now discussed. One
particular estimator with desirable properties of robustness arises from the Huber
function
½
ρH (t) =
1 2
2t
k|t|
− 21 k 2
if | t |≤ k,
if | t |> k
(3.3)
for a suitable choice of k. It turns out that the estimator Tn is equivalent to
the sample mean of a sample in which all observations xj such that | xj − Tn |> k
are replaced by Tn − k or Tn + k, whichever is the closer.
Another M-estimator, with
½
ρ(t) =
1 2
2t
1 2
2η
if | t |≤ η,
if | t |> η
(3.4)
can be similarly interpreted as a trimmed mean. Tn is now the sample mean of those
observations xj satisfying | xj −Tn |< η. This extends the modified trimming above
from rejection of a single extreme value to rejection of all sample values whose
residuals about Tn are sufficiently large in absolute value. See [Huber(1981)] for
details.
Standard cost functions
– Normal criterion : For a gaussian distribution, the ML estimation leads to
ρ(x) =
x2
;
2
ψ(x) = x
(3.5)
N
N
X
X
4 1
(xi − θ) = 0 yields θ̂ = x̄ =
xi
N
i=1
i=1
– Double exponential criterion : For a modulus distributions, the score function is given by
½
ρ(x) = |x|;
N
X
ψ(x) =
−1 x < 0
1 x>0
(3.6)
ψ(xi − θ) = 0 yields θ̂ = sample median
i=1
– Maximum likelihood criterion : The choice of ρ(x) = − log f (x) (where f
represent the observation PDF) gives the ordinary maximum likelihood estimate.
56
Robust Estimation
When the basic model involves a scale parameter, the distribution function is
of the form F [(x − θ)/σ]), modified forms of the M-estimator have been proposed.
The estimator of θ is a solution Tn of an equation of the type
n
X
ψ[(xj − Tn )/σ̂] = 0
(3.7)
j=1
where the scale parameter estimator σ̂ is robust for σ and is estimated either
independently by some suitable scheme or simultaneously with θ by joint solution
of ( 3.7)
3.2.1
Minimax M-estimate of location estimator
In this section, we consider a robust estimation in a minimax sense based on
Huber’s minimax M-estimator [Huber(1981)]. Huber considered the robust location estimator problem. Suppose we have one-dimensional (1-D) i.i.d. observations
x1 , x2 , . . . , xn . The observations belong to some sample space X , which is a subset
of the real line IR. A parametric model consist of a family of probability distributions Fθ (or equivalently a family of PDF fθ ) on the sample space, where the
unknown parameter θ belongs to some parameter space θ. When estimating location in the model X = IR, Fθ (x) = F (x − θ), the M-estimator is determined by a
ψ− function of the type ψ(x, θ) = ψ(x − θ), i.e., the M-estimate of the location
parameter θ is given by the solution to the equation
n
X
ψ(xi − θ) = 0.
(3.8)
i=1
Assuming that the sample distribution belongs to the set of ²- contaminated Gaussian models given by :
P² = {(1 − ²)N (0, ν 2 ) + ²H; H is a symmetric distribution}
(3.9)
where 0 < ² < 1 is fixed, and ν 2 is the variance of the nominal Gaussian
distribution. It can be shown that, within mild regularity, the asymptotic variance
of an M-estimator of the location θ defined by (3.8) at a distribution F ∈ P² is
given by [Huber(1981)]
R 2
ψ dF
V (ψ; F ) = R 0
(3.10)
( ψ dF )2
Huber’s idea was to minimize the maximal asymptotic variance over P² , that is,
to find an M-estimator ψ0 that satisfies
sup V (ψ0 ; F ) = inf sup V (ψ; F ).
F ∈P²
ψ F ∈P²
(3.11)
This is achieved by finding the least favorable distribution F0 , i.e., the distribution
that minimizes the Fisher information
3.2 M- Estimation
57
Z Ã
I(F ) =
00
F
F0
!2
dF
(3.12)
00
over all F ∈ P² . Then, ψ0 = − FF 0 is the maximum likelihood estimator for this least
favorable distribution. Using the above concepts of minimax robustness, Huber
showed that the Fisher information is minimized by
(
f0 (x) =
√1−²
2πν
√1−²
2πν
2
x
exp( 2ν
2 ),
for | x |≤ kν 2
2 2
exp( k 2ν − k | x |), for | x |> kν 2
(3.13)
where k, ², and ν are connected through
φ(kν)
²
− Q(kν) =
kν
2(1 − ²)
where
(3.14)
x2
1
def
φ(x) = √ e− 2
2π
and
Z ∞
x2
1
e− 2 dx.
Q(t) = √
2π t
The corresponding minimax M-estimator is then determined by the Huber
penalty function and its derivative, which is given by
( 2
½
x
sign(x)K, |x| > K
,
|x| ≤ K
2
; ψH (x) =
(3.15)
ρH (x) =
K2
x, |x| ≤ K.
K|x| − 2 |x| > K
def
N
X
ψH (xi − θ) = 0 is solved by numerical methods
i=1
These are the ρ and ψ functions associated with a function which is “normal” in
the middle with “double exponential” tails. The constant K regulates the degree
of robustness ; good choices for K are between 1 or 2 times the standard deviation
of the observations. The corresponding M-estimator is the minimax solution.
3.2.2
Influence Function
The influence function (IF) introduced in [Hampel et al.(1986)], is an important tool used to study robust estimators. It measures the influence of a vanishingly
small contamination of the underlying distribution on the estimator. It is assumed
that the estimator can be defined as a functional T operating on the empirical
distribution function Fn , T = T (Fn ) and that the estimator is consistent as
n → +∞, i.e., T (F ) = limn→+∞ T (Fn ), where F is the underlying distribution.
The influence function is defined as
58
Robust Estimation
T [(1 − t)F + t∆x ] − T (F )
t→0
t
IF (x; T, F ) = lim
(3.16)
where ∆x is the distribution that puts a unit mass at x. Roughly speaking, the
influence function IF (x; T, F ) is the first derivative of the statistic T at an underlying distribution F and at the coordinate x.
The influence function measures the effect of a deviation from the assumed
distribution on a descriptive statistics, T , in other words robustness. The utility
of the influence function is that allows us to calculate the asymptotic covariance
of the M-estimates using the formula [Huber(1981)],
Z
Cov {T (Fn ), T (F )} = IF (x; T, F )IF (x; T, F )T dF
(3.17)
One can proceed to calculate IF (x; T, F ) and Cov {T (Fn ), T (F )} for the given
signal model.
3.2.3
M-Estimation of a deterministic signal parameter
Consider the general signal in noise model given in (12.1) : x(t) = s(t, θ) + z(t)
where z(t) is i.i.d. noise and the signal, s(t), is parameterized by θ = (θ1 , · · · , θM )T ,
(.)T denoting transposition. The aim is to estimate θ from N observations x(t), t =
1, · · · , N . Given the noise density, f (z), one obtains the ML solution as
θ̂ = arg min
θ
N
X
ρ{x(t) − s(t, θ)}
(3.18)
n=1
where ρ(x) = − log f (x). Alternatively, we can solve the M coupled equations
N
X
ψ{x(t) − s(t, θ)}
n=1
∂s(t, θ)
=0
∂θ
(3.19)
where ψ(x) = −f 0 (x)/f (x) is the location score function of f (x). It is clear that
without a priori knowledge of f (x), the estimation of θ cannot be optimal. Huber
considered estimation in the presence of outliers or impulsive noise and proposed the concept of M-estimation [Huber(1981)]. In an M-estimation framework
− log f (x) is replaced with a similarly behaved function, ρ(x), chosen to confer
robustness on the estimator under deviations from a nominal density. Thus, a MEstimate for θ can be obtained as a solution of the optimization problem given in
equation (3.18) or by solving the M coupled equations
N
X
n=1
ϕ{x(t) − s(t, θ)}
∂s(t, θ)
=0
∂θ
(3.20)
where ϕ(x) = ρ0 (x). When f (x) is unknown one is unsure of how close ϕ(x) is to
ψ(x).
3.2 M- Estimation
3.2.4
59
Theoretical performance
Let F be the distribution of the noise and Fn its empirical counterpart from a
sample of size n. Then an estimate of θ can be defined in terms of a functional T
operating on Fn , T (Fn ), while the true parameters are obtained as T (F ). Under
some mild conditions such as IE[ϕ(x)] = 0, M-estimates possess desirable properties
such as consistency and asymptotic normality [Hampel et al.(1986), Huber(1981)].
Herein we assume a symmetric noise density and antisymmetric ϕ to ensure this
condition is met.
• Asymptotic covariance : Using the influence function concept, it is proved in
[Brcich et Zoubir(2002)] that the asymptotic covariance of estimation errors
of θ has the form
ÃN
!−1
E[ϕ2 (x)] X
Cov {T (Fn ), T (F )} =
Λn ΛTn
(3.21)
E[ϕ0 (x)]2
n=1
where ϕ(x) = ρ0 (x) and Λn is the gradient of sn (θ). Then the only degree of
freedom at our disposal for minimizing the asymptotic covariance is through
appropriate choice of ϕ(x).
• Asymptotic normality : Define Cov {T (Fn ), T (F )} to be the asymptotic
variance then,
1
d→∞
n 2 (T (Fn ) − T (F )) −→ N (0, Cov {T (Fn ), T (F )})
(3.22)
• Consistency : Let F belong to a family of distributions F, then T (Fn )
converges in probability to T (F ) as n → ∞,
IP {| T (Fn ) − T (F ) |> ²} → 0 as n → ∞,
F ∈F
(3.23)
for any ² > 0.
3.2.5
Minimax optimal cost function
Let the noise distribution f be known incompletely ; what is known is only
that it belongs to a certain class P. Applying to our M-estimator the Cramer-Rao
inequality, under certain regularity assumptions, gives
Cov {T (Fn ), T (F )} ≥ A(Λn )I(f )−1
(3.24)
where I(f ) is the Fisher information and A(Λn ) is a matrix depending only on
Λn . The worst distribution is naturally the one for which the right-hand part in
(3.24) is maximal, or I(f ) is minimal. In other words, the robust Huber’s minimax
estimator over P is defined as in the ML method by equation (3.18) with the loss
function
ρ∗ (z) = − ln(f ∗ (z))
(3.25)
where f ∗ (z) is selected such that the information on the parameter contained
therein is minimal, i.e. a solution of the problem
f ∗ (z) = arg min I(f )
f ∈P
(3.26)
60
Robust Estimation
R
where I(f ) = (f 0 (z))2 /f (z))dz denotes the Fisher information. We call the Mestimator robust if the loss function ρ is given according to (3.26) and (3.25).
This approach consists to consider the worst case (among p ∈ P) corresponding
to the PDF giving the minimum Fisher information value. Solving the worst case
would insure robustness (good estimation performance) if the considered signal
pdf belongs to P. It is emphasized that the robustness property of the estimator
depends on how the class P is defined. Thus, in order to obtain the robust minimax
estimator, first, an appropriate class P should be defined, and after that, the loss
function ρ is given by (3.25) and (3.26).
3.3
Concluding Remarks
M-estimation is an alternative approach for robust estimation that is used to
implement sub-optimal estimators which are robust to changes in the underlying
distribution.
Since the impulsive noise is present in communications channels, the M-estimation
of signal parameters in the additive noise model becomes an important issue. The
approach to robust estimation taken in the second part of this thesis follows the
M-estimation concept of robust statistics, except for that the density function is
modelled as an α-stable PDF and is estimated from the observations.
However, many questions remain open for serious discussions such as the choice
of the so called score function ϕ in the case of α-stable noise model. The second
part of this thesis investigates these difficulties and proposes some solutions in the
context of a multicomponent non-stationary FM signal.
Chapitre 4
Time-Frequency Concepts
Time–frequency signal processing (TFSP) represents a set of effective methods,
techniques and algorithms used for analysis and processing of non-stationary signals, as found in a wide range of applications including telecommunications, radar
and biomedical engineering. TFSP is a natural extension of both the time domain
and the frequency domain processing, that involves representing signals in a two–
dimensional space, and so reveals “complete” information about the signal. Such
a representation is intended to provide a distribution of signal energy versus time
and frequency simultaneously. More details and advances of TFSP can be found in
[Cohen(1995), Flandrin(1998), Hlawatsch(1998), Boashash(2002)]. This chapter,
therefore, provides a brief background on the fundamental concepts of TFSP that
will be used in the subsequent 2-nd part.
4.1
Need of Time-Frequency Representation
The two classical representations of a signal s(t), are the time-domain representation, and, frequency-domain representation S(f ) = FT {s(t)}, where FT stands
for the fourier transform. Each classical representation of the signal s(t) is non localized with respect to the excluded variable. Consequently, such representations
are not suitable for signals with time-varying spectral contents (non-stationary
signals). For non-stationary signals, an indication as to how the frequency content
of the signal changes with time, is needed.
The magnitude spectrum (frequency representation) of a signal gives no indication as to how the frequency content of the signal changes with time, an important
information when one deals with FM signals. Time–frequency signal processing,
being a natural extension of both the time domain and the frequency domain pro-
62
Time–Frequency Concepts
cessing, preserves and reveals this information about the signal. TFSP involves,
and is intended to provide a distribution of signal energy versus both time and
frequency. For this reason, the TF representation is commonly referred to as a
TFD [Boashash(2002)].
In order to see the inherent limitations of the classical representations of a nonstationary signal, consider a linear frequency modulated (LFM) signal with length
N = 128 and sampling frequency fs = 1 Hz. Its frequency increases linearly from
0.1 to 0.4 Hz. Figure 4.1 shows different representations of this signal. The time
representation of the LFM signal gives no indication about the frequency content
of the signal, neither does the spectrum of the signal as to how the spectrum
of the signal changes with time. This example shows more clearly why classical
representations are inadequate for non-stationary signals.
1
0.8
0.6
0.4
x(t)
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
20
40
60
80
100
120
140
Time [Sec]
(a)
700
600
500
|S(f)|2
400
300
200
100
0
0
20
40
60
80
100
120
140
Frequency [Hz]
(b)
Fig. 4.1: (a) : Time-domain and (b) : frequency-domain representations of
an LFM signal. It shows clearly the inherent limitation of classical representations of a non-stationary signal.
To overcome the inadequacies of classical representations of a non-stationary
signal, which was exposed by the above example, we desire a representation in
4.2 Nonstationarity and FM Signals
63
the two dimensional (t, f ) space. Such a representation is called a TFD. As an
illustration, Figure 4.1 shows one particular TF representation of the LFM signal
in Figure 4.1 using Wigner-Ville distribution (WVD).
The representation in Figure 4.1 not only shows the start and stop times and
the frequency range of the LFM signal, but also clearly shows the variation in
frequency with time. The latter feature, which shows at a glance the frequency at
a given time or the time at which a given frequency is present, is missing from the
conventional signal representations in Figure 4.1.
The use of a TFD for a particular signal inevitably depends on the nature of the
signal (whether it is mono- or multi-component), and, the properties that the TFD
is expected to satisfy. A set of the properties a TFD needs to satisfy, is reported in
[?]. In [Boashash et Sucic(2003)], Boashash et al. give a subset of those properties
which are more important in most practical applications.
Fs=1Hz N=128
Time−res=1
120
Time (seconds)
100
80
60
40
20
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
0.5
Fig. 4.2: A TF representation of the LFM signal in 4.1.
4.2
Nonstationarity and FM Signals
We recall now some important definitions.
Définition 4.1 (Analytic signal [Boashash(2002)]).
Let s(t) be a real FM signal of the general form :
s(t) = A(t) · cos[θ(t)],
(4.1)
with the assumption that the spectra of the amplitude A(t) and phase θ(t) are
separated (nonoverlapped) in frequency, i.e. the signal approaches a narrowband
condition [Boashash(1992a)].
64
Let H[·] denote the Hilbert transform of the signal, such that
H[s(t)]
def
=
s(t) ?
=
1
p.v.
π
1
πt
½Z
∞
−∞
s(τ )
dτ
t−τ
¾
where p.v.{.} is the Cauchy principle value of the improper integral given in
this case by
¸
·Z t−δ
Z ∞
s(τ )
s(τ )
dτ +
dτ
(4.2)
lim
δ→0
−∞ t − τ
t+δ t − τ
A signal z(t) defined as
∆
z(t) = s(t) + jH[s(t)] ≈ A(t) ejθ(t)
(4.3)
is called the analytic signal of the real signal s(t). The approximation is valid
for the above narrowband condition.
The definition of the analytic signal is important to define the IF of signal s(t).
Définition 4.2 (Instantaneous frequency [Boashash(2002)]).
Let z(t) be an analytic signal given in the form
z(t) = Az (t) ejθz (t)
(4.4)
The instantaneous frequency of signal z(t) is then defined as
∆
fin (t) =
1 dθz (t)
2π dt
(4.5)
The IF, fin (t), presents a measure of the localization in time of “that” frequency at time t. In this sense, a signal is said to be nonstationary if its IF varies
in time. We can observe in Figure 4.3 the TV behavior of an engineering signal
(linear FM signal, used in radar and military applications) and real–life signals
(whale song, electroencephalogram signal, bat signal).
Note that, Definition 4.2 is applicable to monocomponent signals only, such
the signal illustrated in Figure 4.3(a). When more than one “ridge” appears in the
signal TF representation, the signal is said to be multicomponent, e.g. the signals
in Figure 4.3(b–d). The importance of the IF and its applications is represented
by Boashash in [Boashash(1992a), Boashash(1992b), Boashash(1992c)].
The nonstationarity can also be expressed in the common sense of random
process as shown in [Boashash et Sucic(2002)]. Let z(t) be a complex signal of
which the autocorrelation function is defined as :
n
τ
τ o
∆
Rz (t, τ ) = E z(t + )z ∗ (t − )
(4.6)
2
2
If Rz (t, τ ) only depends on the time–lag τ , which is the difference in time between
t1 = t+τ /2 and t2 = t−τ /2, the signal s(t) is said to be wide–sense stationary (we
only consider the second–order moment). On the other hand, when this condition is
not satisfied, s(t) is said to be nonstationary, the autocorrelation function Rz (t, τ )
depends on both the time and the time–lag.
4.2 Nonstationarity and FM Signals
Fs=1Hz N=128
65
Fs=1Hz N=7000
Time−res=4
WHALE SIGNAL
Time−res=120
7000
120
6000
100
5000
Time (secs)
Time (secs)
80
60
40
4000
3000
2000
20
1000
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
0.5
0.05
(a) Linear FM signal
Fs=20Hz N=600
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
0.5
0.4
0.45
0.5
(b) Whale signal
Fs=1Hz N=400
b−Distribution
Time−res=5
0.1
BAT SIGNAL
Time−res=8
30
400
350
25
300
250
Time (secs)
Time (seconds)
20
15
200
150
10
100
5
50
1
2
3
4
5
6
Frequency (Hz)
7
8
9
10
0.05
(c) Electroenphalogram signal
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
(d) Bat signal
Fig. 4.3: Examples of nonstationary signals.
An engineering application is shown in (a) for a linear FM signal (plotted using the Wigner–Ville distribution). Real–life applications are shown in
(b–d) for a whale signal, an electroencephalogram signal, and a bat signal,
respectively (all plotted using the B distribution).
Définition 4.3 (Linear FM signal [Boashash(2002)]).
Consider the typical FM transmission in communications systems, a narrowband
FM signal is commonly defined as [Boashash(1992a)] :
µ
¶
Z t
∆
s(t) = A(t) · cos 2πfc t + 2π
m(τ ) dτ .
(4.7)
−∞
When m(t) is a linear function of t, i.e. m(t) = αt, signal s(t) is called a linear
frequency–modulated (LFM) signal. In addition, if A(t) is a rectangular function the signal is called a “chirp”. A chirp signal, with duration T and bandwidth
B, can be expressed as : [Rihaczek(1985)] :
schirp (t) = rectT (t) cos[2π(fc t +
α 2
t )]
2
The analytic signal associated with schirp (t) is then given by
α 2
)
z LFM (t) = rectT (t) ejθ(t) = rectT (t) ej2π(fc t+ 2 t
(4.8)
and its IF is
chirp
fin
(t) =
1 dθ(t)
= fc + αt.
2π dt
(4.9)
66
The chirp signal defined in (4.8) is of practical importance. It is the basic signal used in radar applications, and can be easily generated [Rihaczek(1985)]. It
is also used in military communication applications where the chirp is sent out as
a hostile signal to destroy other communications [Proakis(1995), Milstein(1988),
Amin(1997)]. In this thesis, we will refer to the chirp signal as an LFM signal (i.e.
the rectangular amplitude is implicit).
Based on the above concepts of analytic signals and instantaneous frequency
for nonstationary signals, we now see how they evolve around the fundamentals of
TFSP.
4.3
The STFT, SPEC, WVD, and Quadratic TFD
To study the spectral properties of the signal at time t, an intuitive approach is
to, first, take a slice of the signal by applying a moving window centered at time t
to the signal, and then calculate the magnitude spectrum of the windowed signal.
Consider a signal s(τ ) and a real, even window h(τ ), whose FTs are S(f ) and
H(f ) respectively. To obtain a localized spectrum of s(τ ) at time τ = t, multiply
the signal by the window h(τ ) centred at time τ = t, obtaining
sh (t, τ ) = s(τ )h(τ − t),
(4.10)
and then take the FT w.r.t. τ , obtaining
Sh (t, f ) = FT {s(τ )h(τ − t)}
t→f
(4.11)
Sh (t, f ) is called the short-time Fourier transform (STFT). The squared magnitude of the STFT, denoted by ρspec (t, f ) is called the spectrogram (SPEC)
[Boashash(1992c), Cohen(1995)]. It is mathematically expressed as
¯Z ∞
¯2
¯
¯
2
spec
−j2πf
τ
ρ (t, f ) = |S(t, f )| = ¯¯
s(τ )h(τ − t)e
dτ ¯¯ .
(4.12)
−∞
where S(t, f ) is the STFT. By varying t, one can obtain the spectral density as a
function of t.
The SPEC is a simple, popular and robust method in the analysis nonstationary signals. It is a proper energy distribution in the sense that it is positive. On
the other hand, the SPEC has an inherent limitation : the frequency resolution is
dependent on the length (and the type) of the analysis window ; too short windows
cause a decrease in frequency resolution, and too long windows cause a decrease in
time resolution, thus an inherent trade–off between time and frequency resolution
in the SPEC for a particular window.
It was argued that since a signal has a spectral structure at any given time,
there should exist the notion of an “instantaneous spectrum” which has the physical attributes of an energy density. Based on this argument, the WVD was derived,
and is defined for an analytic signal z(t) as [Boashash(1992c)]
Z ∞
τ
τ
∆
Wz (t, f ) =
z(t + )z ∗ (t − ) e−j2πf τ dτ.
(4.13)
2
2
−∞
4.4 Reduced Interference Distributions
67
It can be observed from (4.13) that the WVD is the Fourier transform (FT)1 of
Kz (t, τ ) from τ to f , where
τ
τ
∆
Kz (t, τ ) = z(t + )z ∗ (t − )
2
2
(4.14)
is called the time–lag signal kernel.
The WVD is the most widely studied TFD. It achieves maximum energy
concentration in the TF plane about the IF for LFM signals [Cohen(1995)]. However, it is in general non–positive and it introduces cross–terms when multiple
frequency laws (e.g. two LFM components) exist in the signals.
A general class of quadratic TFD can be obtained by smoothing/filtering
the WVD in t and f , and is expressed as [Cohen(1966)]
ZZZ ∞
τ
τ
∆
ρz (t, f ) =
ej2πν(u−t) Γ (τ, ν) z(u + )z ∗ (u − ) e−j2πf τ dν du dτ (4.15)
2
2
−∞
where Γ (τ, ν) is a two–dimensional function in the Doppler–lag domain, (τ, ν),
and is called the TFD Doppler–lag kernel. The kernel determines the TFD and
its properties. We can obtain the TFD with certain desired properties by properly
constraining the Γ (τ, ν) function. Table 4.1 lists some common TFD and their
corresponding Doppler–lag kernels.
Equation (4.15) can be simplified as [Boashash(2002)] :
ρz (t, f ) = γ(t, f ) ? ? Wz (t, f ).
t f
(4.16)
The notation ?? in (4.16) represents a convolution in both t and f directions, and
tf
γ(t, f ) is the time–frequency kernel obtained through a dFT operation on Γ (τ, ν)
as :
ZZ ∞
γ(t, f ) =
Γ (τ, ν) e−j2πf τ e+j2πtν dτ dν
−∞
Remark 4.1. Convention of dFT and dIFT operations : a dFT operation, transforming a function of two variables (t, f ) to another function of (τ, ν), contains
one FT operation from t to ν and one IFT operation from f to τ , and the FT
and IFT are interchangeable ; inversely, a dIFT operation, transforming a function of two variables (τ, ν) back to (t, f ), contains one IFT operation from ν to
t and one FT operation from τ to f , and these IFT and FT operations are also
interchangeable.
4.4
Reduced Interference Distributions
The problem of cross–terms introduced by WVD when applying it to a multicomponent signal can be dealt with by selecting a suitable kernel Γ (τ, ν) which
minimizes the cross–terms effectively. The corresponding TFD to such kernels are
1
Convention of FT and IFT operations : an FT operation will transform a function
either from t to ν domain, or from τ to f domain ; inversely, an IFT operation goes either
from ν back to t, or from f back to τ .
68
known as reduced interference distributions (RID). Examples of time-frequency
RID include the CWD [Choi et Williams(1989)], the BJD [Cohen(1966)], the cone–
shaped ZAMD [Zhao et al.(1990)] and the MBD [Hussain(2002)], defined in Table 4.1.
The RID may be applied in situations where there are simultaneously a number
of signals of interest which need to be separated.
Tab. 4.1: Some common TFD and their kernels.
Name
Kernel : g(ν, τ )
WVD
1
Z
SPEC
τ
τ
h(u + ) h∗ (u − ) e−j2πν u
2
2
2 2
CWD
e−v
BJD
sin(πντ )
πντ
ZAMD
MBD
4.5
TFD : ρz (t, f )
τ ∗
τ
z(t + ) z (t − ) e−j2πf τ dτ
2
2
Z
2
e−j2πf τ z(τ )h(τ − t) dτ ZZ r
τ
πσ −π2 σ(u−t)2 /τ 2
τ ∗
e
z(t + ) z (t − ) e−j2πf τ du dτ
τ2
2
2
Z ∞
Z t+ |τ |
2
τ
1
τ ∗
z(u + )z (u − ) du e−j2πf τ dτ
|τ |
2
2
−∞ |τ | t− 2
Z ∞
Z t+ |τ |
2
τ ∗
τ
z(u + )z (u − ) du e−j2πf τ dτ
h(τ )
|τ |
2
2
−∞ 2α−1t−2 2 Z
τ ∗
Γ(2α)/2
Γ (α)
τ
? z(t + ) z (t − ) e−j2πf τ dτ
t
2
2
cosh2α (t)
Z
τ /σ
sin(πντ )
πντ
|Γ(α + jπν)|2
; α ∈ IR+
Γ2 (α)
h(τ )|τ |
The WVD and Ambiguity Function
By taking the dFT of the WVD, we obtain the symmetrical AF, also called
Sussman AF
Az (τ, ν) = FT FT −1 {Wz (t, f )}
t→ν f →τ
ZZ ∞
=
Wz (t, f ) ej2πf τ e−j2πνt dt df
−∞
Z ∞
=
Kz (t, τ ) e−j2πνt dt.
(4.17)
−∞
Slightly different definitions of AF have been used by different authors, however,
they are all related to the symmetrical form Az (t, τ ) [Matz et Hlawatsch(1998b)].
A nonstationary signal, therefore, can be analyzed in either the time–frequency
domain (t, f ) or the ambiguity domain (τ, ν), also called Doppler–lag domain.
There, also, exists a relationship between the WVD and the AF via the Radon
transform [Jain(1989)], that is, the FT of the Radon–transformed WVD yields the
AF in polar coordinates [Ristic(1995)].
The concept of AF has been used as a very effective tool in the design of radar
signals [Boashash(1992c), Cook et Bernfeld(1993)]. This function is the basis in
modern radar technology.
4.6 Relationships Among Dual Domains
4.6
69
Relationships Among Dual Domains
The relationship between dual domain pairs, time–frequency and Doppler–
delay, and, time–delay and Doppler–frequency, can be represented as in Figure 4.4
[Boashash(2002)] through FT and IFT with respect to variables. Each arrow in Figure 4.4 represents a FT from one variable to the other, the inverse direction
represents an IFT operation.
FT
FT
FT
FT
Fig. 4.4: Quadratic representations corresponding to the WVD.
Wz (t, f ), Az (τ, ν), Kz (t, τ ) and Dz (ν, f ) are respectively the WVD, AF, time–
lag signal kernel and the Doppler–frequency signal kernel of the analytic
signal z(t).
Moreover, for the general quadratic class of TFD in (4.15), the above relationship is illustrated in Figure 4.5 [Boashash(1992c), Boashash(2002)], the Az (τ, ν) is
the GAF. Note that, there is a strong coherence between quadratic TF signal repre-
FT
FT
FT
FT
Fig. 4.5: Dual domains of general signal quadratic representations.
γ(t, f ), Γ (τ, ν), G(t, τ ) and G(ν, f ) are the TFD time–frequency, Doppler–lag,
time–lag and Doppler–frequency kernel, respectively. ρz (t, f ) and Az (τ, ν) are
the general quadratic TFD and the GAF of the analytic signal z(t).
70
sentations and LTV systems [Matz et Hlawatsch(1998b), Hlawatsch et Matz(2000)].
A quadratic time-frequency analysis of a LTV system can be based on the linear relation between the WVD [Mecklenbräuker et Hlawatsch(1997)] or modified WVD [Hlawatsch et Matz(2000)] of both the input and the output. This
input-output relationship can be, in general, described as TFD [Gaarder(1968),
Altes(1980), Flandrin(1988a), Nguyen et al.(2001b)]
E {ρx (t, f )} = ρs (t, f ) ? ? Ψh (t, f )
t f
(4.18)
where ρs (t, f ) is a TFD of the input s(t) ; Ψh (t, f ) is the scattering function which
is related to the random LTV channel impulse response h(t, ν) ; and E {ρx (t, f )}
is the expected value of a TFD of the output x(t).
4.7
Time–Frequency Signal Synthesis
Opposite to the TF signal analysis whereby the analysis algorithms are used to
analyze the TV frequency behavior of signals, TF signal synthesis algorithms are
used to synthesize, or estimate, signals from their TFD. Mathematically, assuming
that z(t) is a signal of interest with ρz (t, f ) being its TFD in the general quadratic
class, the synthesis problem can be formulated as : find the analytic signal ẑ(t)
whose estimate TFD, ρẑ (t, f ), best approximates ρz (t, f ). Consequently, ẑ(t) gives
the best estimate of z(t). Seminal to the problem of TF signal synthesis is the
algorithm in [Boudreaux-Bartels et Marks(1986)] using WVD. The basis for the
solution is the inversion property of the WVD [Boashash(1992c)]
Z ∞
t
1
(4.19)
Wz ( , f ) ej2πf t df
z(t) = ∗
z (0) −∞
2
implying that the signal may be reconstructed to within a complex exponential
constant ejα = z ∗ (0)/|z(0)| given |z(0)| 6= 0. Other time-frequency synthesis algorithms can be found in [Boashash(1991), McHale et Boudreaux-Bartels(1993),
Wood et Barry(1994), Hlawatsch et Krattenthaler(1997), Francos et Porat(1999)].
4.8
IF Estimation
There are two major existing approaches for IF estimation using TFD. The
first is built on the first–order moment of TFD [Boashash(1991)]. The first–order
moment of the WVD yields the IF [White et Boashash(1988), Boashash(1991)],
while others yield approximations of the IF [Boashash(1992c)]. However it fails to
estimate multicomponent signals due to the presence of cross–terms.
The second approach is built on utilizing the fact that all TFD have peaks
around the IF laws of signals. The peaks of the WVD was used for IF estimation
and applied to many problems [Boashash(1992c)]. For better performance at lower SNR, the XWVD was proposed [Boashash et O’Shea(1993)]. Other algorithms
of TFD–based peak estimation can be found in, for examples, [Boashash(1992c),
4.9 Engineering Applications of Time–Frequency Methods
71
Stankovic et Katkovnik(1998), Katkovnik et Stankovic(1998)]. Like the first approach, this approach also suffers from the presence of cross–terms in multicomponent signals which results in poor estimation.
Upon the desired to design high resolution RID, the B-Distribution (BD) was
then proposed in [Barkat(2000)], and MBD was developed in [Hussain(2002)], both
with adaptive algorithms for IF estimation of multicomponent signals.
4.9
Engineering Applications of Time–Frequency
Methods
This section looks at the existing applications of time-frequency signal processing and describes a representative selection of time-frequency signal processing
applications, encompassing telecommunications, radar, sonar, power generation,
image quality, and biomedical engineering.
– Time-Frequency Methods in Communications : Telecommunications
is one of the key industries where time-frequency methods are already playing
an important role. [Barbarossa et Scaglione(1999b)] investigate the problem
of optimal precoding and channel capacity for transmission over linear timevarying (LTV) channel, in wireless communications where the multipath
channels are underspread with finite Doppler and delay spread.
In modern communication systems, a number of users can share the same
communication channel via multiple-access (MA), common examples are
FDMA, TDMA, and CDMA [Rappaport(1996)]. The potential demand for
wireless communications combined with restricted availability of the radio
frequency spectrum has motivated intense research into bandwidth-efficient
multiple-access schemes. CDMA has received its potential attention amongst
other schemes. Issues such as designing/assigning the spreading codes in
CDMA and multiple access interference become major concerns in research
activities of which a number of approaches are based on the concept of timefrequency [Crespo et al.(1995), Haas et Belfiore(1997), Joshi et Morris(1998)].
One of the main objectives of the third-generation mobile and personal telecommunication systems is to provide wide range of services with different
bit rates [Swarts et al.(1999)]. Then, a new approach called time-frequencyslicing (TFS) was proposed for multirate access in [Karol et al.(1997)].
– Time-Frequency Methods in Radar : Time-frequency methodologies
have made significant inroads already in these field. A baseband Doppler
radar return from a helicopter target is an example of a persistent nonstationary signal. A linear time-frequency representation provides a high
resolution suitable for preserving the full dynamic range of such complicated
signals [Marple S.L.(2001)].
– Time-Frequency Methods in Biomedical Engineering : An example
of time-frequency methodology used for the detection of seizures in recorded
EEG signals is proposed in [Celka et al.(2001)]. The techniques used are
adapted to the case of newborn EEGs, which exhibit some well defined
features in the time-frequency domain that allow an efficient discrimination
72
between abnormal EEGs and background. Another TF approach to newborn
EEG seizure detection is described in [H. Hassanpour et Boashash(2003)].
– Other Applications : There are a number of applications that could not
be included in the chapters for obvious space reasons.
4.10
Concluding Remarks
Time-frequency signal analysis (TFSA) is a collection of theory and algorithms
used for analysis and processing of non-stationary signals, as found in a wide range
of applications.
In this chapter, the main knowledge of TFSA have been summarized. Concisely presented, this comprehensive tutorial introduction to TFSA is accessible to
anyone who has taken a first course in signal processing.
However, expert reader can find a more detailed references and real life applications in the signal processing literature like as [Boashash(2002), Cohen(1995)].
Deuxième partie
Séparation de Sources Impulsives
à Variance Infinie
Dans le premier chapitre de cette partie (chapitre 5), nous
rappelons les grands principes et méthodes existants de la
séparation de sources. Ensuite, nous présentons dans les
quatres chapitres suivants (chapitres 6–9) nos approches
novatrices en séparation aveugle d’un mélange linéaire
instantané de sources impulsives alpha-stables.
Chapitre 5
State of the Art of BSS
Blind source separation (BSS) or independent component analysis (ICA) is a
method for finding underlying factors or components from multivariate statistical
data. What distinguishes BSS from other methods is that it looks for components
that are both statistically independent, and non-gaussian. In the second part of this
thesis, we will focus on the BSS of linear instantaneous mixtures. This BSS problem
has the advantages of simplicity and generality since the statistical principles used
in this context can be applied also to solve the convolutive mixing problem. In
this chapter, we briefly introduce the basic concepts, and estimation principles of
BSS. Following different types of sources statistical information, our contributions
in this part will be divided in four other chapters : The first one is devoted to BSS
methods using fractional lower-order statistics (FLOS). We will pay a particular
attention to this chapter1 and give a general framework, for separation methods
using FLOS. In a second contribution of BSS, we give a theoretical procedure for
constructing contrast functions using sub- or super -additive functional. The third
contribution is devoted to BSS methods based on some normalized HOS. While the
fourth one is devoted to a semi-parametric maximum likelihood approach coupling
a stochastic version of the EM algorithm and the use of the log-spline functions to
approximate the sources PDFs.
1
To our best knowledge there exists no BSS procedure based on FLOS whilst many are
based on HOS and SOS.
76
5.1
5.1.1
Introduction
What is blind source separation (BSS) ?
Blind source separation (BSS) is a fundamental problem in signal processing
that is sometimes known under different names : blind array processing, independent component analysis, waveform preserving estimation, etc. In all these
instances, the underlying model is that of m statistically independent signals
s(t) = (S1 (t), · · · , Sm (t))T whose n mixtures are observed y(t) = (y1 (t), · · · , yn (t))
possibly in a noisy environment w(t) as shown in Fig. 5.1. BSS addresses the
problem, from an observable mixture of source signals, to separate or ideally to
reconstruct the unknown source signals. The term blind refers to the fact that the
source signals and the way the source signals are mixed is unknown. The mixtures of the source signals are termed the observable signals, and the model of the
mixing of the source signals is referred to as the mixing system A. The separated
signals are obtained from the observable signals by means of a separation system
B. The separated signals are obtained from the observable signals by means of a
separation system ; in Figure 5.1 the signal model is depicted as a block diagram.
BSS can have many applications in areas involving processing of multi-sensor si-
w(t)
s(t)
A
y(t)
x(t)
B?
s(t)
Fig. 5.1: Signal model for the blind source separation problem
gnals. Examples at least include : Source localization and tracking by radar and
sonar devices ; speaker separation (cocktail party problem) ; multiuser detection
in communication systems ; medical signal processing, e.g., separation of EEG or
ECG signals ; industrial problems such as fault detection ; extraction of meaningful
features from data, etc.
This area has been very active over the last two decades. Surprisingly, this seemingly impossible problem has elegant solutions that depend on the nature of the
mixtures and the nature of sources statistical information [Hyvarinen et al.(2001)].
5.1.2
Brief history of BSS
The problem of blind source separation (BSS) has been first introduced by
J. Hérault, C. Jutten, and B. Ans [Hérault et Ans(1984)], [Hérault et al.(1985)]
for linear instantaneous mixtures. Then, many researchers have been attracted
5.1 Introduction
77
by the subject, and many other works appeared. More precisely, all through the
1980s, BSS was mostly known among French researchers, with limited influence
internationally. The few BSS papers are some presentations in international neural
network conferences in the mid-1980. In those time, another related and attractive
field was higher-order spectral analysis, on which the first international workshop
was organized in 1989. In this workshop, early papers on ICA by [Cardoso(1989a)]
and P. Comon [Comon(1989)] were given. Cardoso used algebraic methods, especially higher-order cumulant tensors, which eventually led to the JADE algorithm
[Cardoso et Souloumiac(1993)]. The use of fourth-order cumulants has been earlier
proposed by [Lacoume et Ruiz(1988)].
The work of the scientists in the 1980’s was extended by, among others, A. Cichocki
and R. Unbehauen, who were the first to propose one of the presently most popular ICA algorithms [Cichocki et al.(1994)], [Cichocki et Unbehauen(1996)]. Some
other papers on ICA and BSS from early 1990s are published [Jutten(2000)]. However, until the mid-1990s, BSS remained a rather small and narrow research effort.
Several algorithms were proposed that worked, usually in somewhat restricted problems, but it was not until later that the rigorous connections of these to statistical
optimization criteria were exposed.
BSS attained wider attention and growing interest after the publication of the infomax principle based approach [Bell et Sejnowski(1995)] on the in the mid-90’s.
This algorithm was further refined by S.-I. Amari and his co-workers using the natural gradient [Amari et al.(1996)], and its fundamental connections to maximum
likelihood estimation, as well as to the Cichocki-Unbehauen algorithm, were established. A couple of years later, A. Hyvarinen and E. Oja presented the fixed-point
or FastICA algorithm [Hyvarinen(1999)], which has contributed to the application
of ICA to large-scale problems due to its computational efficiency.
Since the mid-1990s, there has been a growing wave of papers, workshops, and special sessions devoted to ICA. Indeed, many approaches for performing BSS have
been proposed from different view-point. Using second-order statistics the popular
SOBI algorithm was introduced in [Belouchrani et al.(1997a)] for spatial correlated sources. The same methodology was generalized to cyclostationary sources in
[Abed-Meraim et al.(2001)]. A useful generalization of the fourth-order cumulants
based methods was proposed in [Pesquet et Moreau(2001)], [Moreau(2001)]. The
convolutive model has been addressed in [Castella et Pesquet(2004)]. Motivated by
the useful incorporation of the priori information of data in the BSS framework,
the Bayesian approach have been introduced in [Djafari(1999)]. Other researchers
consider some priori information of the propagation system in a semi-blind model have also contribute in the BSS field. For example, in [Davy et al.(2002)] the
Bayesian approach was coupled with MCMC techniques to estimate the chirp signals. Before that, in [Benidir(1997)] a polynomial approach was proposed. Nonlinear mixture BSS model was early investigated in [Abed-Meraim et al.(1996)] and
[Krob et Benidir(1993)]. Since 1999 an international workshop on ICA and BSS
gather, every year, more than 150 researchers working on blind signal separation,
and contributes to the transformation of BSS to an established and mature field
of research.
As an extension to the instantaneous mixtures, other model of signals mixtures
78
have been considered in the literature of signal processing. More precisely, we can
distinguish three classes of mixtures :
[C]- Linear instantaneous mixtures
This model is commonplace in the field of narrow band array processing where
the transfer function between sources and sensors is given by a constant matrix
A (i.e., involves no delays or frequency distortion) called the ‘array matrix’ or
the ‘mixing matrix’. Many array processing techniques rely on the modelling of A
[Krim et Viberg(1996)] : each column of A is assumed to depend on a small number
of parameters. This information may be provided either by physical modelling (for
example, when the array geometry is known and when the sources are in the farfield of the array) or, more likely, by direct array calibration. In many circumstances
however, this information is not available or is not reliable. Blind source separation
techniques address the issue of identifying A and/or retrieving the source signals
without resorting to any a priori information about mixing matrix A : it exploits
only the information carried by the received signals themselves, hence the term
blind. Performance of such a blind technique, by its very nature, is essentially
unaffected by potential errors in the propagation model or in array calibration (this
is obviously not the case of parametric array processing technique). Of course, the
lack of information on the structure of A must be compensated by some additional
assumptions on the source signals as will be shown next.
[A]- Non-linear mixtures
In the basic signal model of BSS, an unknown linear mixing process is often assumed. However, this model fails as soon as the linear approximation of the physical
phenomenon is not valid. This is the case for example when the signal is received
at an array of sensors with non-linear characteristics. Some particular non-linear
models have been thoroughly studied in the literature such as the post non-linear
model where the mixing process is a cascade of a linear mixture and a componentwise non-linear transform [Taleb(1999)], and the linear quadratic model where
the observations are quadratic functions of the sources [Krob et Benidir(1993)],
[Abed-Meraim et al.(1996)] and [Taleb(1999)]. The general non-linear problem is
still largely unsolved except for some tentative solutions based on neural networks
using self-organizing feature maps [Taleb(1999)] or information preserving nonlinear maps [Taleb(1999)], [Hyvarinen et al.(2001)].
[B]- Linear convolutive mixtures
Many real-world communication systems involve source signals that are delayed and attenuated by different amounts on their way to the different sensors
(receivers), as well as multipath propagation. Moreover, the multipath can be diffuse with long delay spread causing intersymbol interference and resulting in a
situation termed ‘linear convolutive mixing’. Mathematically, the mixing is described by a matrix of linear filters operating on the sources. Although not completely
solved, this problem is much better known than the non-linear mixing one. The
5.1 Introduction
79
first research works have focused on the case where the mixing is square (i.e.,
the number of inputs equals the number of outputs) where multitude solutions
have been given using neural networks, independent component analysis (ICA),
or information-theory approaches [Hyvarinen et al.(2001)]. Interestingly, by stacking successive observations into a single vector, the convolutive mixtures can
be expressed as instantaneous mixtures (with a full column rank mixing matrix if
more outputs than inputs). Thus, BSS solutions for instantaneous mixtures can be
adapted to solve the convolutive mixture problem, e.g., [Mansour et al.(2000a)],
[Babie-Zadeh(2002)], [Castella et al.(2004)], [Castella et Pesquet(2004)].
A good source with historical accounts and more complete list of references is
[Jutten(2000)], a good overview of statistical principle in BSS is [Cardoso(1998)],
and an elegant overview paper is [Mansour et al.(2000b)]. There is still much work
that is left to do. For example, we still do not have an adequate explanation
for why ICA converge for so many problems, almost always to the same solutions, even when the signals were not derived from independent sources !. Other
serious problems include under-determined mixtures case, non-stationary sources,
heavy-tailed sources, non-linear mixtures, dependent sources...etc still open for
researchers efforts.
5.1.3
Statistical information for BSS
Statistical moments of signals provide rich sources about the desired information. The whole spectrum of statistical moments runs from order 0 to order ∞ (see
Figure 5.2). The oldest traditional signal separation methods utilize only secondorder moments like as PCA methods [Belouchrani et al.(1997a)]. All through the
1990s, BSS methods was extended to use widely the higher-order statistical moments [Nikias et Petropulu(1994)], [Comon(1994)], [Cardoso et Souloumiac(1993)].
More recently, lower-order fractional statistical signal processing techniques extract
useful information from pthe-order statistics with −1 < p < 2. In this thesis, we
will show that blind source separation methods based on stable models can be adequately solved using fractional lower-order moments, i.e., moments of order less
than 2.
Lower−order Fractional
Moment Theory
0
0.5 0.8 1
Second−order
Moment Theory
1.5 1.8 2
Higher−order
Moment Theory
3
4
Fig. 5.2: Order of Statistics in Blind Source Separation
80
Thus, the sources statistical information can be of three types :
[A]- Higher order statistical information
For non Gaussian independent sources higher order statistics (HOS) can be
used to achieve the BSS. The first HOS-BSS approach traces back to the pioneering adaptive algorithm of [Hérault et Ans(1984)], [Hérault et al.(1985)]. This
method do not use explicitly HOS but try to equalize the channel by minimizing a
cost function that contains implicitly the information of higher order moments of
the output. Alternative batch algorithms, that explicitly use higher-order cumulants, have been later developed ; see, for instance [Cardoso(1991)], [Comon(1994)].
Other HOS-based solutions include the separation by maximum likelihood (ML),
separation by neural networks, separation by contrast function, separation by
information-theoretic criteria, .. etc.
[B]- Second order statistical information
It was early used in blind equalization [Delmas et al.(2000)], in DOA estimation [Delmas(2004)] and in many other classical signal processing methods related
on estimation and detection. When the data show some kind of temporal dependency, alternative BSS methods can be developed based on second order statistics
[Abed-Meraim et al.(1997b)]. SOS-based methods are expected to be more robust
to poor signal to noise ratios and short data sizes [Gazzah et Abed-Meraim(2003)].
BSS is feasible based on spatial correlation matrices [Belouchrani et al.(1997a)],
[Mansour et al.(2000a)]. These matrices show a simple structure which allows
straightforward blind identification procedures based on eigendecomposition. For
example, a new algorithm using second-order cyclostationary statistics is introduced in [Abed-Meraim et al.(2001)].
[C]- Fractional lower order statistical information
It is known that, for a non-Gaussian stable distribution with characteristic exponent α, only moments of order less than α are finite. In particular, the secondorder moment of a stable distribution with α < 2 does not exist, making the use
of covariance as a measure of correlation meaningless. Similarly, many standard
signal processing tools (e.g., spectral analysis and all higher-order techniques) that
are based on the assumption of finite variance will be considerably weakened and
may, in fact, give misleading results.
Recall that the stable distribution is best used to model signals and noise that exhibit impulsive nature. This type of signal tends to produce outliers. Although the
SOS and HOS -based BSS methods usually lead to analytically tractable results,
they are no longer appropriate for an impulsive non-Gaussian signals. It has been
demonstrated many times in the literature that second and higher-order estimates
can deteriorate dramatically when only a small proportion of extreme observations
is present in the data.
The absence of a finite SOS and HOS does not mean, however, that there are no
other adequate measures of independence of stable random variables. As it will be
5.2 Linear Instantaneous Mixtures
81
shown later in this thesis, the dispersion of a stable random variable plays a role
analogous to the SOS. Despite the aforementioned difficulties, significant progress
has been made in developing a linear estimation theory for stable processes over
the past thirty years.
In this thesis, we introduce a new source separation class of methods based on
the use of the fractional lower-order statistics (FLOS), i.e., statistics of order less
than 2.
5.2
Linear Instantaneous Mixtures
Consider m mutually independent signals whose n ≥ m linear combinations
are observed in noise :
x(t) = y(t) + w(t) = As(t) + w(t)
(5.1)
where s(t) = [s1 (t), · · · , sm (t)]T is the real source vector, w(t) = [w1 (t), · · · , wn (t)]T
is the real noise vector, and A is the n × m full rank mixing matrix.
The purpose of blind source separation is to find a separating matrix i.e., an m × n
matrix B such that z(t) = Bx(t) is an estimate of the source signals.
5.2.1
Separability and indeterminacies
When the sources are white stationary processes, then their separation can be
achieved under the following conditions.
Theorem 5.1. If there is at most one Gaussian source, then the independence of
the components of y implies BA = PΛ, where P and Λ represent a permutation
and a diagonal matrix, respectively.
In other words, the linear instantaneous mixtures are separable, up to a permutation and a scale indeterminacies, provided that there is at most one Gaussian
source.
We will not prove the identifiability of the BSS model here, since the proof is quite
complicated ; see the Comon’s paper [Comon(1994)]. Next, we develop a kind of a
constructive discussion (non-rigorous proof) about the identifiability.
[A]- Separability of instantaneous linear mixtures model
To make sure that the basic BSS model given in (9.6) can be estimated, we
have to make certain assumptions :
1. The sources s(t) are at each time instant mutually independent : This is the
principle on which ICA rests. Surprisingly, not much more than this assumption is needed to ascertain that the model can be estimated. This is why BSS
is such a powerful technique with applications in many different areas.
Basically, r.v.s Y1 and Y2 are said to be independent if information on the
82
value of Y1 (of Y2 ) does not give any information on the value of Y2 (of
Y1 ). Technically, independence can be defined by the PDFs. Let us denote
p(y1 , y2 ) the joint PDF of Y1 and Y2 , and by pi (yi ) the marginal PDF of Yi
for i = 1, 2. Then we say that Y1 and Y2 are independent if the joint PDF is
decomposable in the following way :
p(y1 , y2 ) = p1 (y1 )p2 (y2 )
(5.2)
2. At most one source has Gaussian distribution : Whitening also helps us understand why Gaussian variables are forbidden in BSS. Assume that the joint
distribution of two sources s1 and s2 , is Gaussian. This means that their
joint PDF is given by
p(s1 , s2 ) =
1
s2 + s22
1
ksk2
exp(− 1
)=
exp(−
)
2π
2
2π
2
(5.3)
Now, assume that the mixing matrix A is orthogonal. For example, we could
assume that this is so because the data has been whitened. Using the classic
formula of transforming PDF’s, and noting that for an orthogonal matrix
A−1 = AT holds, we get the joint density of the mixtures x1 and x2 as
density is given by
p(x1 , x2 ) = |det(AT )|
kAT xk2
1
exp(−
)
2π
2
(5.4)
Due to the orthogonality of A, we have kAT xk2 = kxk2 , det(A) = 1 and
AT is also orthogonal. Thus we have
p(x1 , x2 ) =
ksk2
1
exp(−
) = p(s1 , s2 )
2π
2
(5.5)
and we see that the orthogonal mixing matrix does not change the PDF,
since it does not appear in this PDF at all. The original and mixed distributions are identical. Therefore, there is no way how we could infer the mixing
matrix from the mixtures.
The phenomenon that the orthogonal mixing matrix cannot be estimated
for Gaussian variables is related to the property that uncorrelated jointly
Gaussian variables are necessarily independent. Thus, the information on
the independence of the components does not get us any further than whitening. Thus, in the case of Gaussian independent components, we can only
estimate the BSS model up to an orthogonal transformation. In other words,
the matrix A is not identifiable for Gaussian independent components. With
Gaussian variables, all we can do is whiten the data.
What happens if we try to estimate the BSS model and some of the components are Gaussian, some non-Gaussian ? In this case, we can estimate all
the non-Gaussian components, but the Gaussian components cannot be separated from each other. In other words, some of the estimated components
will be arbitrary linear combinations of the Gaussian components. Actually,
this means that in the case of just one Gaussian source, we can estimate
the model, because the signal Gaussian component does not have any other
Gaussian component that it could be mixed with.
5.2 Linear Instantaneous Mixtures
83
3. The number of sensors is greater than or equal to the number of sources n ≥ m :
This assumption is needed to make the mixing matrix A a full rank matrix.
Then after estimating the matrix A, we can compute its inverse, say B, and
obtain the independent components simply in the noiseless case by s = Bx.
[B]- Indeterminacies in the instantaneous linear mixtures model
In the BSS model (9.6), it is easy to see that the following two ambiguities or
indeterminacies will necessary hold. First there is no way of knowing the original
labelling of the sources, hence any permutation of the outputs is also a satisfactory
solution, i.e., if z(t) is a solution then Pz(t) is also a solution for any permutation
matrix P. Choosing a labelling of the outputs can only be done with some extra
knowledge of the system. The second ambiguity is that exchanging a fixed scalar
factor between a source signal and the corresponding column of A does not affect
the observations as is shown by the following relation :
x(t) = As(t) + w(t) =
m
X
ap
p=1
λp
λp sp (t) + w(t)
(5.6)
where λp is an arbitrary real factor and ap denotes the p-th column of A.
It follows that the best that one can do is to determine B (or equivalently the
matrix A) up to a permutation and scaling of its columns [Mansour et al.(2000b)].
Therefore, B is said to be a separating matrix if
By(t) = PΛs(t)
where P is a permutation matrix and Λ a non-singular diagonal matrix. Similarly,
blind identification of A is understood as the determination of a matrix equal to
A up to a permutation matrix and a non-singular diagonal matrix.
Many authors take advantage of the scaling indetermination by assuming, without
any loss of generality, that the source signals have unit variance, so that the dynamic range of the sources is accounted for by the magnitude of the corresponding
columns of A. Other normalization strategies exist such as normalizing the diagonal entries of A (respectively B) to unity.
5.2.2
How to find the independent components
It may be very surprising that the independent components can be estimated
from linear mixtures with no more assumptions than their independence. In this
chapter, we will try to explain briefly why and how this is possible.
[A]- Uncorrelatedness is not enough
The first thing to note is that independence is a much stronger property than
uncorrelatedness. Considering the BSS problem, we could actually find many different uncorrelated representations of the signals that would not be independent
and would not separate the sources. Uncorrelatedness in itself is not enough to separate the components. This is also the reason why principal component analysis
84
(PCA) or factor analysis cannot separate the signals : they give components that
are uncorrelated, but little more. In fact, by using the well-known decorrelation
methods, we can transform any linear mixture of the independent components into
uncorrelated components, in which case the mixing is orthogonal. Thus, the trick
in BSS is to estimate the orthogonal transformation that is left after decorrelation.
This is something that classic methods cannot estimate because they are based on
essentially the same covariance information as decorrelation. In the following, we
consider a couple more sophisticated and popular procedures for estimating ICA.
[B]- Nonlinear decorrelation is the basic ICA method
One way of stating how independence is stronger than uncorrelatedness is to
say that independence implies nonlinear uncorrelatedness : If s1 and s2 are independent, then any nonlinear transformation g(s1 ) and h(s2 ) are uncorrelated.2 In
contrast, for two r.v. that are merely uncorrelated, such nonlinear transformations
do not have zero covariance in general. Thus, we could attempt to perform BSS by
a stronger form of decorrelation, by finding a representation where the yi are uncorrelated even after some nonlinear transformations. This gives a simple principle
of estimating the separating matrix B :
BSS approach 1 : Nonlinear decorrelation. Find the matrix B so that
for any i 6= j, the components yi and yj are uncorrelated, and the transformed components g(yi ) and h(yj ) are uncorrelated, where g and h are
some suitable nonlinear functions.
This is a valid approach to estimating ICA : If the nonlinearities are properly chosen, the method does find the independent components. Although this principle is
very intuitive, it leaves open an important question : How should the nonlinearities
g and h be chosen ? Answer to this question can be found be using principles from
estimation theory and information theory. Estimation theory provides the most
classic method of estimating any statistical model : the maximum likelihood method. Information theory provides exact measures of independence, such as mutual
information. Using either one of these theories, we can determine the nonlinear
functions g and h in a satisfactory way.
[C]- Independent components are the maximally non-gaussian components
Another very intuitive and important principle of ICA estimation is maximum
non-gaussianity. The idea is that according to the central limit theorem, sums of
non-gaussian r.v.s are closer toPgaussian that the original ones. Therefore, if we
take a linear combination y = i bi xi of the observed mixture variables, this will
be maximally non-gaussian if it equals one of the independent components. This
is because if it were a real mixture of two or more components, it would be closer
to a gaussian distribution, due to the central limit theorem. Thus, the principle
can be stated as follows :
2
In the sense that their correlation is zero i.e. IE[g(s1 )h(s2 )] = 0.
5.3 Basic BSS Methods
85
BSS approach 2 : Maximum non-Gaussianity.
P Find the local maxima
of non-gaussianity of a linear combination y = i bi xi under the constraint
that the variance of y is constant. Each local maximum gives independent
component.
To measure non-gaussianity in practice, we could use, for example, the kurtosis. Recall that the kurtosis is a normalized higher-order cumulant, which are
some kind of generalizations of variance using higher-order polynomials. Cumulants have interesting algebraic and statistical properties which is why they have
an important part in the theory of BSS. An interesting point is that this principle
of maximum non-gaussianity shows the very close connection between BSS and
an independently developed technique of robust statistic called projection pursuit.
in projection pursuit, we are actually looking for maximally non-gaussian linear
combinaisons, which are used for visualization and other purposes. Thus, the BSS
problem can be interpreted as projection pursuit directions.
[D]- Importance role of numerical techniques :
In addition to the estimation principle, one has to find efficient algorithm for
implementing the computations needed. Thus, numerical algorithms are an integral
part of BSS methods. The numerical methods are typically based on optimization
of some objective functions. The basic optimization method is the gradient method.
For example, a well known fixed-point algorithm called FastICA has been tailored
to exploit the particular structure of the ICA problem.
5.3
5.3.1
Basic BSS Methods
BSS by minimization of mutual information
An important approach for blind source separation, inspired by information
theory, is the minimization of mutual information. The motivation of this approach
is that it may not be very realistic in many cases to assume that the data follows
the BSS model. Therefore, we would like to present here an approach that does not
assume anything about the date. What we want to have is a general measure of the
dependence of the components of a random vector. Using such a measure, we could
define BSS as a linear decomposition that minimize that dependence measure. We
recall here very briefly the basic definitions of information theory. The differential
entropy H of a random vector y = (y1 , · · · , yn )T with density p(y) is defined as :
def
H(y) = −IE{log p(y)}
(5.7)
A normalized version of entropy is given by negentropy J, which is defined as
follows
def
J = H(yGauss ) − H(y)
(5.8)
where yGauss is a Gaussian random vector of the same covariance or correlation
matrix as y. Negentropy is always non-negative, and is equal to zero only for
86
Gaussian random vectors. Mutual information I between n r.v.s yi , i = 1, · · · , n is
defined as follows
n
X
I(y1 , · · · , yn ) =
H(yi ) − H(y)
(5.9)
i=1
Mutual information
can also be expressed as the Kullback-Leibler divergence of
Q
py (y) and i pyi (yi ) :
¸
py (y)
I(y) =
py (y) log Q
dy
y
i pyi (yi )
def
Z
·
(5.10)
[A]- Mutual information as a measure of dependence
From the well known properties of the Kullback-Leibler
divergence, I(y) is alQ
ways non-negative, and is zero if and only if py (y) = i pyi (yi ), that is, if y1 , · · · , yn
are independent [Cover et Thomas(1991)]. Consequently, I(y) is a measure of dependence or contrast function and source separation algorithms can be designed
based on its minimization.
[B]- Mutual information as maximum likelihood estimation
Mutual information (MI) and likelihood are intimately connected. Indeed, it
was shown that the minimization of the mutual information is asymptotically a
Maximum Likelihood (ML) estimation of the sources [Taleb(1999)]. More and more
connections exist between MI and ML approaches in practice because we do not
know the distributions of the sources. For example, the need of approximation of
MI can use the ML estimation. Consequently, many recent works are based on this
criterion [Pham(1999)].
[C]- Mutual information as maximization of non-gaussianity
To state the idea, suppose that the whitening z = Wx has been done, and
hence a unitary matrix U must be estimated to achieve independent outputs. Now
1
from y = Uz we have py (y) = |det(U)|
pz (z). Consequently, H(y) = H(z) and
P
I(y) =
H(y
)
−
H(z).
Since
H(z)
does
not depend on U, minimizing I(y)
i
i
with respect to U is equivalent to minimizing the sum of the marginal etropies.
Moreover, −H(yi ) can be seen as the Kullback-Leibler divergence between the
density of yi and a zero-mean unit-variance Gaussian density (up to a constant
term). This leads us to this conclusion that U must be estimated to produce the
outputs as non-Gaussian as possible. This fact has a nice intuitive interpretation :
from the central limit theorem we know that the mixing tends to gaussianize the
observations, and hence the separating system should go to the opposite direction.
A well-known algorithm based on the non-gaussianity of the outputs is FastICA
[Hyvarinen(1999)], [FastICA(1998)] which uses negentropy as a measure of nongaussianity.
87
[D]- Algorithms for minimization of mutual information
To use MI in practice, we need some method of estimating or approximating
it from real data. We recall that there is many mutual entropy approximation techniques. The cumulant-based approximation was proposed in [Jones et Sibson(1987)],
and it is almost identical to those proposed in [Comon(1994)]. The approximation
of entropy using nonpolynomial functions were introduced in [Hyvarinen(1998)],
and they are closely related to the measure of non-gaussianity that have been
proposed in the projection pursuit literature, see, e.g., [Cook et al.(1993)].
5.3.2
BSS by maximization of non-gaussianity
Non-gaussianity is actually of paramount importance in blind source separation. Without non-gaussianity the separation is not possible at all, as shown above.
An important class of source separation algorithms is based on the non-gaussianity
of the outputs [Hyvarinen(1999)], [FastICA(1998)]. As a first practical measure of
non-gaussianity, the fourth-order cumulant, or kurtosis was introduced. Practical algorithms were derived using the gradient and fixed-point methods. However,
kurtosis has some drawbacks in practice, when its value has to be estimated from
a measured sample. The main problem is that kurtosis can be very sensitive to
outliers. in other words, kurtosis is not a robust measure of non-gaussianity. To
mitigate this problem, the negentropy was proposed as an important measure of
non-gaussianity. Its properties are in many ways opposite to those of kurtosis : It
is robust but computationally complicated. Furthermore, computationally simple
approximation of negentropy that more or less combine the good properties of
both measures are introduced in different paper in BSS literature (for more detail
refer to [Hyvarinen et al.(2001), chapter 8]).
5.3.3
BSS by maximum likelihood estimation
A very popular approach for estimating the independent component analysis
model is maximum likelihood (ML) estimation. Maximum likelihood estimation
is a fundamental method of statistical estimation ; a short introduction will be
provided in chapter 8. One interpretation of ML estimation is that we take those
parameter values as estimates that give the highest probability for the observations.
To perform maximum likelihood estimation in practice, we need an algorithm to
perform the numerical maximization of likelihood. For that, we distinguish two
cases :
• Sources PDF’s are known : If the densities of the independent components
are known in advance, a very simple gradient algorithm can be derived.
To speed up convergence, the natural gradient version and especially the
FastICA fixed-point algorithm can be used that maximize the likelihood
faster and more reliably.
• Sources PDF’s are unknown : If the densities of the independent components
are not known, the situation is somewhat more complicated. Fortunately,
however, it is enough to use a very rough density approximation as we will
perform in chapter 8 using the family of log-spline functions. The choice of
88
the density can then be based on the information whether the independent
components are sub- or super-gaussian. Such an estimate can be simply
added to the gradient methods, and it is automatically done in FastICA.
This is also the approach we have used throughout this thesis in the noisy case
(see chapter 8) as a semi-parametric maximum likelihood approach.
5.3.4
BSS by algebraic tensorial methods
One approach for estimation of independent component analysis consists of
using higher-order cumulant tensor. Tensors can be considered as generalization of
matrices, or linear operators. Cumulant tensors are then generalization of the covariance matrix. The covariance matrix is the second-order cumulant tensor, and the
fourth order tensor is defined by the fourth-order cumulants Cum(xi , xj , xk , xl ).
We can use the eigenvalue decomposition of the covariance matrix to whiten the
data [Abed-Meraim et Hua(1997)]. This means that we transform the data so that
second-order correlations are zero.
As a generalization of this principle, we can use the fourth-order cross-cumulant
tensor to make the fourth-order cumulants zero, or at least as small as possible.
This kind of higher-order decorrelation gives one of the most popular methods for
blind source separation [Cardoso et Comon(1996)]. Joint approximate diagonalization of eigenvalue decomposition is one method in this category that has been successfully used in low-dimensional problems [Cardoso et Souloumiac(1993)]. In the
special case of distinct kurtoses, a computationally very simple method (FOBI) can
be devised. An accessible and fundamental paper is [Cardoso(1999)] that also introduces sophisticated modifications of the previously proposed tensorial methods.
A more interesting generalization is given in [Moreau(2001)]. The tensor-based methods, however, have become less popular recently [Belouchrani et al.(2001)]. This
is because methods that use the whole EVD like JADE are restricted, for computational reasons, to small dimensions. Moreover , they have statistical properties
inferior to those methods using non-polynomial cumulant or maximum likelihood.
We shall consider this approach in more details in chapter 7. Indeed, we propose
in this thesis a normalized version of this class of methods using some normalized
second-order and fourth-order cumulants tensors to separate heavy-tailed signals
[Sahmoudi et al.(2004a)].
5.3.5
BSS by non-linear decorrelation
This approach is the early research effort in BSS that was successfully used by
Jutten, Hérault, and Ans to solve the first ICA problems. A good review of this
class of techniques can be found in [Jutten(2000)]. Today, this work is mainly of
historical interest, because there exist several more efficient algorithms for BSS.
Nonlinear decorrelation can be seen as an extension of second-order methods.
Independent sources can in some cases be found as nonlinearly uncorrelated linear combinations. The nonlinear functions used in this approach introduce higher order statistics into the solution method, making blind source separation
89
possible. In [Cichocki et Unbehauen(1996)], the first most popular learning algorithm was introduced as an extension of the first source separation algorithm
[Hérault et Ans(1984)]. Another well known algorithm is the equivariant adaptive
separation via independence ; EASI algorithm based on nonlinear decorrelation
[Cardoso et Lahed(1996)]. In [Amari et Cardoso(1997)], a different framework based on the estimating functions was introduced. Other somewhat related methods
were proposed in the blind source separation literature [Cichocki et Amari(2002)].
We will recall this approach and give more detail in chapter 7 as well as a normalized version, based on some normalized statistics, of this category of methods
to separate heavy-tailed sources [Sahmoudi et Abed-Meraim(2004b)].
5.3.6
BSS using geometrical concepts
Another method for BSS is the geometric approach [Mansour et al.(2001),
Mansour et al.(2002a), Babaie-Zadeh et al.(2004)]. This approach which holds essentially for two sources and two sensors, is based on a geometrical interpretation
of the independence of two random variables. To state the idea more clearly, suppose that the marginal PDF’s of the sources s1 and s2 are non-zero only within the
intervals M1 ≤ s1 ≤ M2 and N1 ≤ s2 ≤ N2 . Then, from the independence of s1 and
s2 , we have ps1 s2 (s1 , s2 ) = ps1 (s1 )ps2 (s2 ), and hence the support of ps1 s2 (s1 , s2 ) will
be the rectangular region {(s1 , s2 )|M1 ≤ s1 ≤ M2 , N1 ≤ s2 ≤ N2 }. In other words,
the scatter plot of the source samples forms a rectangular region in the (s1 , s2 )
plane. The linear mapping x = As transforms this region into a parallelogram
region. Without loss of generality, one can write
·
A=
1 a
b 1
¸
then it can be seen that the slopes of the borders of the scatter plot of the observations will be b and 1/a. Hence estimating the mixing matrix A is equivalent to
estimating the slopes of the borders of this parallelogram.
5.3.7
Source separation using Bayesian framework
Throughout our work so far we have assumed that there is no information
available about the true parameter beyond that provided by the data. However,
there are situations in which most statisticians would agree that more can be said.
Technically, there is a substantial number of statisticians in the Bayesian School
who feel that it is always reasonable, and indeed necessary, to think of the true
value of the parameter θ as being the realization of a random variable θ with a
known distribution. This distribution does not always correspond to an experiment
that is physically realizable but rather is thought of as a measure of the beliefs of
the experimenter concerning the true value of θ before he or she takes any data.
To describe the Bayesian procedure to source separation, let us write the Bayes’
90
theorem in the case of a source separation problem [Knuth(1999)]
P (A, s(t)|x(t), I) =
P (x(t)|A, s(t), I) P (A, s(t)|I)
P (x(t)|I)
(5.11)
where I represents any prior information. We can rewrite the equation as a proportionality and equate the inverse of the prior probability of the data P (x(t)|I)
to the implicit proportionality constant
P (A, s(t)|x(t), I) ∝ P (x(t)|A, s(t), I) P (A, s(t)|I)
(5.12)
The probability on the left-hand side of Equation (5.12) is referred to as the posterior probability . It represents the probability that given model accurately describes
the physical situation. The first term of the right-hand side is the likelihood of the
data given model. It describes the degree of accuracy with which we believe the
model can predict the data. The final term on the right is the prior probability
of the model, also called the prior. This prior represent the degree to which we
believe the model to be correct based only on our prior information about the
problem. It is through the assignment of the likelihood and priors that we express
all of our knowledge about the particular source separation problem.
If the linear mixture is relatively noise-free, the aims become now to estimate a
separating matrix B that optimizes the posterior probability of the model and estimate the source signals by applying the separation matrix to the recorded data.
The Bayesian methodology has several advantages. The most important advantage is the fact that all of the prior knowledge about a specific problem is expressed in terms of prior probabilities that must be evaluated. This provides one
with the means to incorporate any additional relevant information into a problem
[Djafari(1999)], [Snoussi et M.-Djafari(2000)].
Finally, I want to refer the French reader to Snoussi’s thesis [Snoussi(2003)] as one
of the best references, to my knowledge, about this class of methods.
5.3.8
BSS using time structure
In many applications, the source signals represent temporally correlated (colored) random processes referred to as colored time signals or time series. In that
case, they may contain much more structure than white random processes. This
additional information can actually make the estimation of the BSS model possible
in cases where the basic BSS methods cannot estimate it. For that, we should make
some assumptions on the time structure of the sources that allows for their separation. These assumptions are alternatives to the assumption of non-gaussianity.
[A]- Separation by autocovariances
The simplest form of time structure is given by autocovariances. This means covariances between the values of the signal at different time instants : Cov[xi (t)xi (t−
τ )] where τ is some lag constant. If the data have time-dependencies, the autocovariances are often different from zero. In addition, to autocovariances of one signal,
we also need covariances between two signals : Cov[xi (t)xj (t − τ )] where i 6= j.
91
All these statistics for a given time lag can be grouped together in the time-lagged
covariance matrix
def
Cxτ = E{x(t)x(t − τ )T }
(5.13)
The key point here is that the information in a time-lagged covariance (called also
cross-correlation) matrix Cxτ could be used instead of the higher-order information
[Molgedey et Schuster(1994)]. What we do is to find a matrix B so that in addition
to making the instantaneous cross-correlation of y(t) = Bx(t) go to zero, the lagged
covariances are made zero as well :
E{yi (t)yj (t − τ )} = 0, for all i, j, τ
(5.14)
The motivation for this is that for the sources si (t), the lagged covariances are all
zero due to independence. Using these lagged covariances, we get enough extra information to estimate the sources, under certain conditions like as the the assumption of ”any two sources have different spectral shapes [Belouchrani et al.(1997a)]”.
No higher-order information is then needed. Using this approach, we have a simple
algorithm, called AMUSE [Tong et al.(1991)], for estimating the separating matrix
B from whitened data :
1. Whiten the data x to obtain z.
def
2. Compute the eigenvalue decomposition of C̄zτ = 21 [Cτ + CTτ ], where Cτ =
E{z(t)z(t − τ )} is the time-lagged covariance matrix, for some lag τ .
3. The rows of the separating matrix B are given by the eigenvectors.
An essentially similar algorithm was proposed in [Tong et al.(1991)]. An extension
of the AMUSE method that improves its performance is to consider several time
lags τ instead of a single one. Then, it is enough that the covariances for one
of these time lags are different. Thus, the choice of τ is a somewhat less serious
problem. The principle consist to simultaneously diagonalize all the corresponding
lagged covariance matrices. The algorithm called SOBI (second-order blind identification) [Belouchrani et al.(1997a)] is based on these principles, and so is TDSEP
[Ziehe et Müller(1998)].
[B]- Separation by non-stationarity of variances
If it is assumed that the sources are non-stationary, then we can divide the
signals into short windows as a sparse decomposition of the signal and consider
the covariances in each one
Et∈Tk {yi (t)yj (t)}
(5.15)
where Tk = (kT, (k + 1)T ].
Then, using joint diagonalization of the covariance matrices at different segments
we can separate the non-stationary sources [Pham et Cardoso(2001)].
92
5.4
5.4.1
BSS of Impulsive Heavy-Tailed Sources
Why heavy-tailed α-stable distributions ?
The emphasis in this thesis is on a class of signals that have heavy tailed. Heavy
tail signals are likely to exhibit large observation and have often an impulsive nature - and it turns out that a broad class of real life signals have heavy tailed
nature. The term heavy tail refers to that the probability density functions of the
signals have relative large mass in the tails.
This section provides a motivation for this part of this thesis and a brief discussion
of the fundamental problems in blind signal separation. This part of my work is
partly motivated by the lack of strong theoretical basis for blind signal separation
of heavy tailed signals in the signal processing literature. Furthermore, the motivation for considering heavy tailed distribution in this thesis is that many real
world signals turns out to have heavy tail laws [Adler et al.(1998)].
The main difference between the non-Gaussian stable distribution and the
Gaussian distribution is that the tails of the stable density are heavier than those
of the Gaussian density. In addition, the stable distribution is very flexible as a
modeling tool in that it has a parameter α (0 < α ≤ 2), called the characteristic exponent, that controls the heaviness of its tails. A small positive value of
α indicates severe impulsiveness, while a value of α close to 2 indicates a more
Gaussian type of behavior. Stable distribution obey the Generalized Central Limit
Theorem (GCLT) which states that if the sum of i.i.d random variables with or
without finite variance converges to a distribution by increasing the number of
variables, the limit distribution must be stable [Samorodnitsky et Taqqu(1994)].
Thus, non-Gaussian stable distributions arise as sums of random variables in the
same way as the Gaussian distribution. Another defining feature of the stable
distribution is the so-called stability property, which says that the sum of two
independent stable random variables with the same characteristic exponent is
again stable and has the same characteristic exponent. For these reasons, statisticians [Samorodnitsky et Taqqu(1994)], economists [Rachev(2003)], signal processing and communications engineers [Nikias et Shao(1995)], and other scientists
engaged in a variety of disciplines have embraced alpha-stable processes as the
model of choice for heavy-tailed data.
5.4.2
Existing BSS methods for heavy-tailed signals
A common characteristic property of many heavy-tailed distributions, such as
the α−stable family, is the nonexistence of finite second or higher order moments.
There are several well-known methods for source separation [Hyvarinen et al.(2001)],
based in general on second or higher-order statistics of the observations and so
are inadequate to handle heavy-tailed sources. In that case, fractional lower-order
theory can be used for stable signal separation. Only a limited literature was dedicated to BSS of impulsive signals. In [Shereshevski et al.(2001)], authors proposed the RQML algorithm based on the idea of setting the signals to zero wheM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
5.5 Conclusion & Future Research
93
never they are larger (in absolute value) than some threshold K. Recall that
RQML is the restricted quasi-maximum likelihood approach introduced as an extension of the popular Pham’s quasi-maximum likelihood approach to the α-stable
sources case. Other solutions exist in the literature based on the spectral measure
[Kidmose(2001)], the order statistics [Shereshevski et al.(2001)] and the characteristic function [Eriksson et Koivanen(2003)].
Recently, in [Chen et Bickel(2004)] a new method based on a consistent prewhitening step was proposed. The authors, use the characteristic function based
contrast function proposed in [Kagan et al.(1973)] to achieve the source separation
problem and show that this approach can be consistent even when some hidden
sources do not have finite second moments. However, this approach has consistent
performance justly in the following two cases ; first, at most one component of
sources has infinite second moment and second if we have justly two alpha-stables
sources.
In this thesis, we introduce some new methods for α-stable source separation from their observed linear mixtures using the minimum dispersion criterion
[Sahmoudi et al.(2003a)], contrast functions [Sahmoudi(2005)], normalized statistics [Sahmoudi et al.(2004a), Sahmoudi et Abed-Meraim(2004b)] and the maximum likelihood [M. Sahmoudi et al.(2005)].
5.5
Conclusion & Future Research
In this chapter the fundamental methods of BSS are presented. Although several limitations and assumptions impede the use of BSS methods, it seems appropriate to conjecture that the algorithms and methods are useful tools with many
potential applications where many second-order statistical methods reach their limits. Several researchers believe that these techniques will have a huge impact on
engineering methods and industrial applications.
It is interesting to note that there are many issues subject to further investigation.
– Underdetermined BSS : Having more sources than sensors is of theoretical
and practical interest.
– Noisy BSS : Much more work needs to be done to determine the effect
of noise on performance. Sparse representation and independent factorial
analysis are very promising ideas.
– Non-stationarity problem : time-frequency analysis and unsupervised classification are two promising approaches in this context.
– BSS for data mining and data warehouse : Data mining, the extraction of
hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important
information in their data warehouses. The goal is to find a subset of a collection of documents relevant to a user’s information request. We believe that
94
the BSS model enables to extend the formulation into more unsupervised
classification problems.
– Blind separation of heavy-tailed signals : some standard BSS methods cannot
work in this case, other are mathematically not justifiable for the fact that
heavy-tailed distribution not have neither finite SOS nor HOS . The goal of
the half of this thesis is to investigate this problem.
Chapitre 6
Minimum Dispersion Approach
6.1
Introduction
This chapter introduces a new Blind Source Separation (BSS) approach for
extracting impulsive source signals from their observed mixtures. The impulsive,
or heavy-tailed signals are modeled as real-valued symmetric alpha-stable (SαS)
processes characterized by infinite second and higher order moments. A new whitening procedure by a normalized covariance matrix is introduced. The proposed
approach uses the minimum dispersion (MD) criterion as a measure of sparseness and independence of the data. We show that the proposed method is robust, so-named for the property of being insensitive to possible variations in the
underlying form of sampling distribution. Algorithm derivation, discussion and
simulation results are provided to illustrate the good performance of the proposed approach. In particulary, the new method has been compared with three of
the most popular BSS algorithms ; JADE [Cardoso et Souloumiac(1993)], EASI
[Cardoso et Lahed(1996)] and RQML [Pham et Garrat(1997)].
6.1.1
The failure of second and higher -order methods
From signal processing point of view, the adoption of a stable model for
signal or noise has important consequences. Second-order stationary processes
have been historically the main subject of study in statistical signal processing.
Second-order-based estimation techniques are commonly recognized as the natural tools to be used in the presence of gaussian noise. Research efforts on higherorder statistics (HOS) have led to the development of improved estimation algorithms for non-Gaussian environments, but this work has been based on the
assumptions that second-order and higher-order statistics of the processes exist
96
and are finite [Nikias et Petropulu(1994)]. Important non-Gaussian impulsive processes can be efficiently modeled by heavy-tailed processes with infinite variance
for which neither the classical second-order theory nor the theory of HOS are useful [Nikias et Shao(1995)].
It has been shown repeatedly in the literature that infinite-variance processes that
appear in practice are well modeled by probability distributions with algebraic
tails, i.e., random variables for which
IP (|X| > x) ∼ cx−α
(6.1)
for some fixed c, α > 0.
Algebraic-tailed r.v.s exhibit finite absolute moments for orders less than α ; i.e.,
IE|X|p < ∞,
p<α
(6.2)
Conversely, if p ≥ α, the absolute moments become infinite, and thus unsuitable
for statistical analysis. When α < 2, the processes present infinite variance, and
the standard second or higher-order statistics cannot be successfully applied.
6.1.2
Fractional lower-order statistics (FLOS) theory
Alternative attempts to characterize the behavior of impulsive signals have
relied on fractional lower-order statistics (FLOS) in the context of non-Gaussian
α-stable distribution (α < 2). It has been shown that FLOS give robust measures of
impulsive processes’ characteristics [Ma et Nikias(1995a)], [Nikias et Shao(1995)].
For a zero location alpha-stable r.v. X with dispersion γ, the norm of X is defined
as
½
γ
if 0 < α < 1
kXkα =
(6.3)
1
α
if 1 ≤ α ≤ 2
γ
Hence, the norm kXkα is a scaled version of the dispersion γ. If X and Y are
jointly alpha-stable, the distance between X and Y is defined as
dα (X, Y ) = kX − Y kα
(6.4)
p
Combining this equations with the fact that IE|X|p ∝ γ α for 0 < p < α, it is easy
to see that the p-th order moment of the difference between two alpha-stable r.v.s
is a measure of the distance dα between these two r.v.s. In addition, all fractional
lower-order moments of an alpha-stable r.v. are equivalent, i.e. p-th and q-th order
moments differ by a constant factor independent of the r.v. as long as p, q < α.
Furthermore, it was shown in [Schilder(1970)] that for 1 ≤ α ≤ 2, k.kα is a norm
in the linear space of alpha-stable processes. Our proposed blind source separation
methodology presented in this chapter uses the notion of fractional lower-order
moments to achieve robust signal reconstruction.
6.2 Source Separation Procedure
6.2
97
Source Separation Procedure
6.2.1
Whitening by normalized covariance matrix
The first step consists of whitening the observations (orthogonalizing the mixture matrix A). For finite variance signals, the whitening matrix W is computed
as the inverse square root of the signal covariance matrix. At a first glance, this
should not be applied to α-stable sources. However, we prove in the following that
a properly normalized covariance matrix converges to a finite matrix with the appropriate structure when the sample size N tends to infinity. More specifically, we
propose and prove the following results.
Theorem 6.1. Let X1 and X2 be two SαS variables of dispersions γ1 and γ2 and
ˆ 1 |2
γ1
PDFs f1 (.) and f2 (.), respectively. Then, we have limN →∞ IE|X
where
ˆ 2 |2 = γ2
IE|X
P
N
1
ˆ denotes the time averaging operator IE[g(X)]
ˆ
IE
=
g[X(t)].
N
t=1
Proof
Let T be an arbitrary positive constant and 1I{|X|≤T } the indicator
function which equals 1 if |X| ≤ T and 0 otherwise. Then, due to the
ergodicity of X1 and X2 , we have
1 PN
2
2
ˆ 2 1I{|X |≤T } ]
IE[X
1
t=1 x1 (t)1I{|X1 |≤T } N →∞ IE[X1 1I{|X1 |≤T } ]
1
N
−→
=
P
N
1
2
ˆ 2 1I{|X |≤T } ]
IE[X22 1I{|X2 |≤T } ]
IE[X
2
2
t=1 x2 (t)1I{|X2 |≤T }
N
(6.5)
Due to the symmetry of α-stable PDF the right hand side term can
be expressed as
RT
RT 2
2
IE[X12 1I{|X1 |≤T } ]
x f1 (x)dx
−T |x| f1 (x)dx
= RT
= R0T
(6.6)
2
IE[X2 1I{|X2 |≤T } ]
|u|2 f2 (u)du
u2 f2 (u)du
−T
0
Using integration by parts and the fact that (1 − Φ(x)) ∼ C2α γx−α
as x → ∞ for any SαS distribution function Φ, we obtain that as
T → ∞, the above ratio is equivalent to
RT
Cα γ1 [x2−α ]T0 − 2 0 x1−α dx T →∞ γ1
−→
(6.7)
R
Cα γ2 [u2−α ]T − 2 T u1−α du
γ2
0
0
Thus, from equations (6.5), (6.6) and (6.7) the ratio
asymptotically to
γ1
γ2 .
ˆ 2]
IE[X
1
ˆ
IE[X22 ]
converge
¥
4
Theorem 6.2. Let x = As be a data vector of an α-stable process mixture and
P
def
T
R̂ = N1 N
t=1 x(t)x(t) its sample covariance matrix. Then the normalized covariance matrix of x defined by
def
R̂ =
R̂
T race(R̂)
(6.8)
98
converges asymptotically to the finite matrix ADAT , where D is the positive diagonal matrix D = diag(d1 , · · · , dm ) with di = Pm γγij kaj k2 where γi is the dispersion
j=1
of the ith source signal and k.k denotes the euclidian norm.
Proof
We have, clearly,
Pm ˆ
m
X
ˆ i (t)]2
IE[si (t)]2 ai aTi
R̂
IE[s
= Pmi=1
=
ai aTi
P
m ˆ
2 ka k2
ˆ j (t)]2 kaj k2
IE[s
IE[s
(t)]
T race(R̂)
j
j
j=1
j=1
i=1
(6.9)
Using theorem 6.1, we see that
Pm
j=1
ˆ i (t)]2
IE[s
γi
N →∞
−→ di = Pm
2
2
2
ˆ j (t)] kaj k
IE[s
j=1 γj kaj k
N →∞
Then, from equations (6.9) and (6.10) R̂ −→
Pm
T
i=1 di ai ai
(6.10)
= ADAT
¥
Proposition 6.1. Let R̂ be the normalized covariance matrix defined above in
(6.8) of the considered α-stable mixture. Then the inverse square root matrix of R̂
is a data whitening matrix.
Proof
Theorem 6.2 means that the normalized covariance matrix R̂ has the
appropriate structure to compute a whitening matrix. Indeed, the
whitening matrix can be obtained from the eigendecomposition of
−1/2
R̂ = UΣUT as W = Σs UTs where Σs (resp. Us ) corresponds
to the diagonal (resp. orthogonal) matrix of the m largest eigenvalues (resp. eigenvectors) of R̂. Then, we can write I = WRWT =
1
1
WADAT WT = (WAD 2 )(WAD 2 )T . Recall that, without loss of
1
generality, A can be replaced by AD 2 (D being a positive diagonal
matrix) because of the scaling indeterminacy. We can see that W
1
transforms AD 2 (i.e. the mixing matrix) into an orthogonal matrix.
¥
6.2.2
Minimum dispersion criterion
[A]- Minimum dispersion criterion in signal processing
The minimum dispersion (MD) criterion is a common tool in linear theory of
stable processes as the dispersion of a stable r.v. plays a role analogous to the
variance. For example, the larger the dispersion of a stable distribution, the more
it spreads around its median. Hence, the minimum dispersion criterion becomes
a natural and mathematically meaningful choice as a measure of optimality in
signal processing problems based on stable models. By minimizing the error dispersion, we minimize the average magnitude of estimation errors. Furthermore, it
99
has been shown that minimizing the dispersion is also equivalent to minimizing
the probability of large estimation errors. Hence, the minimum dispersion criterion
is well justified under the stable model assumption. It is a direct generalization
of the minimum mean-squared error criterion and relatively simple to calculate
[Nikias et Shao(1995)].
Minimizing the dispersion is also equivalent to minimizing the fractional lowerorder moments of estimation errors that measure the Lp distance between an
estimate and its true value, for 0 < p < α ≤ 2. This result is not surprising since
the Lp norms for p < 2 are well known for their robustness against outliers such
as those that may be described by the stable law. It is also known that all the
lower-order moments of a stable r.v. are equivalent, i.e. any two of the lower-order
moments differ by a fixed constant that is independent of the r.v. itself. A common
choice is the L1 norm, which is sometimes very convenient.
Stable signal processing based on fractional lower-order moments will inevitably introduce nonlinearity to even linear problems. The basic reason for the non-linearity
is that we have to solve linear estimation problems in Banach or metric spaces instead of Hilbert spaces. It is well known that, while the linear space generated
by a Gaussian process is a Hilbert space, the linear space of a stable process is
a Banach space when 1 ≤ α < 2 and only a metric space when 0 < α < 1
[Cambanis et Miller(1981)]. Banach or metric spaces do not have as nice properties and structures as Hilbert spaces for linear estimation problems.
[B]- Minimum dispersion criterion for BSS
4
Let z(t) = Bx(t) where B is unitary, x denotes the whitened data, i.e, x = Wx
and B is a separating matrix to be estimated. Let us consider the global MD
criterion given by the sum of dispersions of all entries of z, i.e.
4
J(B) =
m
X
γzi
(6.11)
i=1
where γzi denotes the dispersion of zi (t) the i-th entry of z(t).
In this chapter we prove that the MD criterion defines a contrast function in the
sense that the global minimization of the objective function given in (6.11) leads to
a separating solution. The p-th order moment of an α-stable r.v. and its dispersion
are related through only a constant (see property 2.7). Therefore, the MD criterion is equivalent to least lp -norm estimation where 0 < p < α. Although the most
widely used contrast functions for BSS are based on the second and fourth-order cumulants [Cichocki et Amari(2002)], we believe however that there are good reasons
to extend the class of contrast functions from cumulants to fractional moments, as
we argue next. Mutual information (MI) is usually chosen to measure the degree
of independence. Because the direct estimation of MI is very difficult, one can then
derive approximative contrast functions, often based on cumulant expansions of
the densities. However, one can approximate the Shannon entropy (that is closely
related to the MI) using the lp -norm concept ([Karvanen et Cichocki(2003)]) and
hence use it to approximate the MI. For example, in [Hyvarinen(1999)] the author
uses the lp -norm concept to approximate the MI and then to find the optimal
100
contrast function for exponential power family of density fp (x) = k1 exp(k2 |x|p ).
Thus we propose the MD criterion or equivalently the FLOM based criterion (2.7)
for measuring independence of alpha-stable distributed data. We should also note
here that the lp -norm is commonly used as a measure of sparseness of signals
[Karvanen et Cichocki(2003)]. This leads to the use of MD criterion as a measure of sparseness which has been demonstrated to be a powerful concept in BSS
[Cichocki et Amari(2002)], [Karvanen et Cichocki(2003)]. Consequently, the MD
criterion can be used as a cost function to achieve the BSS as shown by the following result.
Theorem 6.3. The minimum dispersion criterion
def
J(B) =
m
X
γ zi
(6.12)
i=1
is a contrast function under orthogonality constraint for separating an instantaneous mixture of alpha-stable sources.
Proof
Note that z(t) is an orthogonal mixture of the sources and can be
def
written as z(t) = Cs(t) with C = BWA orthogonal. Here we prove
that MD criterion J(B) reaches its minimum value in the set of orthogonal matrices if and only if BW (W is the whitening matrix) is
a separating matrix or equivalently if and only if C is a generalized
permutation matrix (i.e. a permutation matrix times a non-singular
diagonal matrix). Indeed, using properties 1 and 2 of SαS processes
presented in section I, one can write :


m
m
X
X

(6.13)
J(B) =
| Cij |α γsj 
i=1
=
=
Ã
j=1
m
m
X
X
j=1
m
X
!
| Cij |α
γsj
(6.14)
i=1
aj γsj
(6.15)
j=1
4 Pm
α
with aj =
i=1 | Cij | and Cij being the (i, j)-th entry of C. Now
since aj and γsj are positive, minimizing J(B) is equivalent to minimize all aj coefficients. Let us prove that coefficients aj satisfy
aj ≥ 1 ∀ j, and aj = 1 if and only if C is a generalized permutation
matrix. Since C is unitary (which implies
|≤P1) and α < 2 we
P that | Cji
m
α≥
2
have | Cij |α ≥| Cij |2 . Therefore aj = m
|
C
|
ij
i=1
i=1 | Cij | = 1.
The equality holds if and only if ∀i | Cij |α =| Cij |2 or equivalently
if Cij = 0 or |Cij | = 1. C being unitary, the latter is satisfied if and
only if ∀j ∃ ij such that | Cij j |= 1 and Cij = 0 ∀ i 6= ij .
¥
101
The proposed method requires no or little a priori knowledge of the input signals. The dispersion as well as the characteristic exponent α are estimated according to [Tsihrintzis et Nikias(1996)] where the proposed estimator is proved to
be consistent and asymptotically normal. This estimator is based on the theory of
fractional lower order moments of the SαS distributions.
6.2.3
Separation algorithm : Jacobi implementation
Theorem 6.3 proves that under orthogonal transform the signal has minimum
dispersion if its entries are mutually independent. The problem now is to minimize
a cost function under orthogonal constraint. Different approaches exist to perform
this constraint optimization problem. We chose here to estimate B as a product
of Givens rotations according to
B=
Y
Y
Ωpq (θ)
(6.16)
#sweeps 1≤p<q≤m
Were Ωpq (θ) is the elementary Givens rotation defined as orthogonal matrix where
all diagonal elements are 1 except for the two elements c = cos(θ) in rows (and
columns) p and q. Likewise, all off-diagonal elements of Ωpq (θ) are 0 except for
the two elements s = sin(θ) and −s at positions (p, q) and (q, p), respectively. The
minimization of J(Ωpq (θ)) is done numerically by searching θ using a fine grid into
[0, π/2] 1 The so called MD algorithm can be summarized as follows in Table 6.2.3.
'
$
Minimum Dispersion Algorithm
Step 1. Whitening transform
Step 2. Sweep. For all pairs 1 ≤ p < q ≤ m, do
– Compute the Givens angle 0 ≤ θ̂pq < π/2 that maximize the pairwise
independence for zp and zq by minimizing the global dispersion J(Ωpq (θ)).
– If θ̂pq > θ̂min a , rotate the pair accordingly.
– If no pair has been rotated in the previous sweep, endb . Otherwise perform
another sweep.
a
The constant θ̂min is a threshold value that defines the minimum rotation angle that
is significant in estimating B.
b
In our simulation, we used an angle grid resolution of π/100. This value is the same
one that used for the threshold value
&
%
Tab. 6.1 – The principal steps of the proposed minimum dispersion (MD)
algorithm.
1
Here, we consider [0, π/2] instead [0, π] because Ωpq (θ + π/2) is equal to Ωpq (θ) up to
a generalized permutation matrix.
102
6.3
Performance Evaluation & Comparison
This section examines the statistical performances of our MD-based separation procedure. The numerical results presented below have been obtained in the
following setting. The source signals are i.i.d. impulsive standard SαS (µ = 0
and γ = 1). The number of sources is m = 3 and the number of observations is
n = 3. The statistics are evaluated over 100 Monte Carlo runs and the mixing
matrix as well as the sources are generated randomly at each run. The performance of our MD method is compared to three widely used BSS algorithms ;
JADE [Cardoso et Souloumiac(1993)], EASI [Cardoso et Lahed(1996)] and RQML
[Shereshevski et al.(2001)]. To measure the quality of source separation, we did use
the generalized rejection level criterion defined below.
6.3.1
Generalized rejection level index
To evaluate the performance of the separation method, we propose to define the rejection level Iperf as the mean value of the interference signal dispersion over the desired signal dispersion. This criterion generalizes the existing one
[Cichocki et Amari(2002)] based on signal powers2 which represents the mean value of interference to signal ratio. If source k is the desired signal, the related
generalized rejection level would be :
P
P
α
def γ( l6=k Ckl sl )
l6=k |Ckl | γl
=
(6.17)
Ik =
γ(Ckk sk )
|Ckk |α γk
where γ(x) denotes the dispersion of a SαS RV x. Therefore, the averaged rejection
level is given by
m
m
1 X
1 X X | Cij |α γj
Iperf =
Ii =
.
m
m
| Cii |α γi
i=1
6.3.2
(6.18)
i=1 j6=i
Experimental results
• First experiment
In Figure 6.1, we present an example of separation of highly impulsive sources
(α = 0.5) mixed by a random 3 × 3 matrix A. It appears that the proposed
algorithm achieves very good separation quality.
• Second experiment
In Figure 6.2, the mean rejection level of the MD algorithm versus the characteristic exponent is plotted. The sample size is set to N = 1000. It appears
that the parameter α is of crucial importance as it has a major influence on the
separation performance. Two important features are observed : the mean rejection
2
For SαS processes the variance (power) is replaced by the dispersion.
6.3 Performance Evaluation & Comparison
6
5
x 10
7
Source1
3
5
2
0
1
−5
0
−10
−5
−1
Amplitude
Amplitude
6
Source2
0
0
5000
Mixture1
10000
40
20
0
5000
Mixture2
10000
−15
400
50
200
0
0
−50
x 10
0
Source3
5000
Mixture3
10000
0
−20
−40
0
50
Amplitude
x 10
103
−200
−100
5000
10000
0
5000
10000
0
First estimated signal
Second estimated signal
100
50
0
50
0
−50
0
−50
−100
0
5000
Time
10000
−50
0
5000
Time
5000
10000
Third estimated signal
−100
10000
0
5000
Time
10000
Fig. 6.1: Extraction of 3 α-stable sources from 3 observations where α = 0.5,
and N = 10000.
level increases when the sources are very impulsive (α close to zero) or when they
are close to the Gaussian case (α close to two). In the latter case (i.e. α = 2), the
source separation is not possible.
5
EASI
Iperf: Generalized mean rejection level in dB
0
−5
−10
JADE
RQML
−15
−20
MD
−25
−30
0
0.2
0.4
0.6
0.8
1
1.2
1.4
α: Characteristic exponent of the sources
1.6
1.8
2
Fig. 6.2: Generalized mean rejection level versus α where N = 1000.
104
• Third experiment
In Figure 6.3, the simulation study shows that estimation errors of the characteristic exponent α of sources distribution have little influence on the performances
of the algorithm.
10
Iperf Generalized mean rejection level in dB
0
−10
−20
−30
−40
−50
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
∆α: Error of characteristic exponent estimation
0.15
0.2
Fig. 6.3: Generalized mean rejection level versus the estimation error ∆α.
• Fourth experiment
In Figure 6.4, for our proposed MD algorithm, two different scenarios lead
to similar performances. In the first scenario, we consider a mixture of three αstable sources with same characteristic exponent α = 1.5 and in the second one,
we assume wrongly three SαS sources with α = 1.5 while, in reality, the sources
are SαS with different characteristic exponents α1 = 1.5, α2 = 1 (Cauchy pdf)
and α3 = 2 (Gaussian pdf). It can be observed that the algorithm can separate
sources from their mixtures even though we deviate from the assumptions under
which it is derived. Consequently, the MD algorithm is robust to possible sources
modelization errors.
105
−10
MD: sources with same characteristic exponent α=1.5
MD: sources with different characteristic exponent α1=1, α2=1.5 and α3=2
−12
Iperf: Generalized mean rejection level
−14
−16
−18
−20
−22
−24
−26
−28
−30
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
N: sample size
Fig. 6.4: Generalized mean rejection level versus sample size N.
• Fifth experiment
Figure 6.5 shows the performance obtained by each of the four BSS algorithms
as a function of the sample size N in the case of α = 1.5.
One can observe that good performances are reached by the MD algorithm for
relatively small/medium sample sizes.
This figure demonstrates also that EASI fails to separate α-stable signals and
that JADE is sub-optimal in this context. This is due to the fact that EASI and
JADE are not specifically designed for heavy-tailed signals.
As a comparison between MD and RQML, we can observe a certain performance gain in favor of the MD algorithm. This is due to the fact that truncating
observations, in RQML procedure, created by large source signal values is not
optimal because these observations must be very informative.
106
5
EASI
Iperf: Generalized mean rejection level
0
−5
JADE
−10
RQLM
−15
MD
−20
−25
−30
0
500
1000
1500
2000
2500
3000
N: sample size
3500
4000
4500
5000
Fig. 6.5: Generalized mean rejection level versus sample size for α = 1.5.
• Sixth experiment
In the sixth experiment, we consider the case where the observation is corrupted by an additive white Gaussian noise.
The mean rejection level versus noise power is depicted in Figure 6 for α = 1.5
and N = 1000.
In this experiment, the noise level σ 2 is varied between 0 dB and -30 dB.
As can be seen, the performances degrade significantly when the noise power
is high. This might be explained by the fact that the theory does not take into
consideration additive noise.
Improving robustness against noise is still an open problem under investigation.
It can be seen from Figure 6.6, however, that the proposed MD method has reliable
performance and outperforms RQML algorithm in the low or moderate noise power
situation.
6.4 Concluding Remarks
107
0
EASI
Iperf Generalized mean rejection level in dB
−5
−10
JADE
−15
RQML
−20
MD
−25
−30
−25
−20
−15
−10
−5
0
noise power in dB
Fig. 6.6: Generalized mean rejection level versus the additive noise power
for α = 1.5.
6.4
Concluding Remarks
We have introduced a two step procedure for α-stable source separation.
A first generalized whitening step allows to orthogonalize the mixing matrix using
a normalized covariance matrix of the observation.
In the second step, the remaining orthogonal matrix is estimated by minimizing a
global dispersion criterion.
The proposed method is robust to modelization errors of the source pdf. Numerical
examples are presented to illustrate the effectiveness of the proposed method that
is shown to perform better than the RQLM method. Moreover, they confirm that
existing BSS methods, which are not designed specifically to handle impulsive
signals, fail to provide good separation quality.
108
109
Chapitre 7
Sub- and Super- Additivity
based Contrast Functions
In this chapter, we introduce a generalization of our previous contribution.
Indeed, we provide a systematic method to construct contrast functions through
the use of sub- and super-additive functionals1 . Some practical examples of useful
contrast functions are introduced and discussed.
7.1
BSS Using Contrast Functions
In this chapter, we consider the mixture model given by x = As where A is an
unknown n × m mixing matrix, x denotes the observation vector and s represents
the source vector. The separation problem consist of finding a separating matrix
B such that the components of y = Bx are independent. Note that belong this
chapter we consider BSS under orthogonality constraint (assuming implicitly that
a whitening step has already been performed). Thus, in the rest of this chapter we
suppose that B is an orthogonal matrix.
7.2
On contrast functions
The concept of contrast function for source separation has been first presented by [Comon(1994)]. A contrast function for source separation is a real valued
1
These arguments follow the same procedure as in [Sahmoudi et al.(2005)]. Furthermore, inspired from the proof of the fact that the minimum dispersion is a contrast function, this chapter is a direct generalization of the previous one
110
Sub- and Super- Additivity based Contrast Functions
function of the distribution of a random vector which is minimized (or maximized)
when the source separation is achieved. To characterize mathematically a contrast
function we use the following definition
Définition 7.1. A functional F is a contrast function if and only if it satisfies
the two requirements :
R1. F (Cs) ≥ F(s) for any independent random vector s and any invertible
matrix C.
R2. The equality F (Cs) = F(s) holds if and only if C = PD, where P and
D are a permutation and a diagonal matrix, respectively.
Thus, we define the contrast function as in [Comon(1994)] as functional of the
distribution of Bx or also of B which attains its minimum (or maximum) when
separation is achieved. Intuitively, a measure of dependence between the components of Bx would be a contrast, but there may be other. It should be noted
that the construction of a contrast function is only a first step toward a separation procedure. The contrasts proposed here are theoretical functionals because
they depend on the distribution of the reconstructed sources, which is unknown.
To obtain a usable contrast, this distribution or in fact certain functionals of it
must be estimated from the data. We will not consider here this problem as well
as that of constructing a good algorithm for minimizing the resulting empirical
contrast. It should be however pointed out that the ease of the estimation and
of the minimizing algorithm, both in term of implementation and computational
cost, should be taken into account, besides performance considerations, in assessing the final separation method. As existing examples of contrast functions, it
has been shown in [Comon(1994)] and [Cardoso(1998)] that the sum of the 4-th
order cross-cumulants of the components is a contrast function. Other contrast
functions can be found in [Moreau et Macchi(1996)], [Moreau et Pesquet(1997)],
[Moreau et Stoll(1999)], [Cardoso(1999)], [Pham(2000)], [Adib et al.(2002)].
Note that the ideas of this chapter are inspired from the projection pursuit
methodology described in [Huber(1985)]. In this paper, Huber used sub- and superadditive functionals (under other additional assumptions) to define a test statistics
for normality. Similarly, we use these classes of functionals to define some index
of non-gaussianity. Thus minimizing the proposed criterions may be viewed as
maximizing the non-gaussianity of the observations.
Remark 7.1. Some heuristic arguments that sub- additivity functionals go together with non-gaussianity measure, are as follows :
– The cumulants which are widely used as measure of non-gaussianity are
additive (sub- and super- additive) functionals.
– The exponential Shannon entropy defined by
½ Z
¾
H(x) = exp − log(f )f dx
(7.1)
where f is the PDF of x, which is commonly used for non-gaussianity measure is super-additive. For a proof, see [Blachman(1965)].
7.3 Orthogonality constraint
111
Définition 7.2. A functional F of the distribution of a random variable X, denoted by F(X), is said to be scale equivariant if
F(aX) = |a|F(X)
(7.2)
for any real number a.
Note that if F is scale equivariant, then |F| is also scale equivariant. Hence,
we can without loss of generality assume in this work F ≥ 0.
7.3
Orthogonality constraint
Principal Component Analysis (PCA) or whitening consists of transforming
the observation vector to decorrelated outputs. However, it is well known that the
PCA job is not sufficient for separating the sources.
To see this, consider a square BSS model (i.e. n = m). For estimating the n × n
matrix A, taking into account n scale ambiguities, we must determine n(n − 1)
unknown coefficients. The second-order decorrelation constraints give n(n − 1)/2
equations, which is not sufficient for determining A. Thus we get a proof that the
Gaussian sources cannot be separated : they are characterized justly by first and
second order statistics.
It is interesting to note that the second-order independence (whitening), solves
the BSS problem up to an orthogonal transformation. To see this, consider the
factorization B = UW of the separating matrix, where w is the spatial whitening
matrix of the observations and U is an orthogonal transformation. In other words,
for z = Wx, we suppose that IE{zzT } = I without loss of generality. Now, since
the output are independent, from
IE{yyT } = UIE{zzT }UT = I,
(7.3)
we deduce that UU = I is the second half job of the BSS problem.
Thus one can say that whitening solves half of the problem of BSS. Because whitening is a very simple and standard procedure, much simpler than any BSS algorithms, it is a good idea to reduce the complexity of the problem this way. The
remaining half of the parameters has to be estimated by some other method. This
fact shows that for finding the other required equations, other information must
be used such as HOS and FLOS.
Even in cases where whitening is not explicitly required, it is recommended,
since it reduces the number of free parameters and considerably increases the
performance of the methods, especially with high-dimensional data.
7.4
Sub-Additivity based Contrast Functions
Définition 7.3. A functional F of the distribution of a random variable X, denoted by F(X), is said to be sub-additive if
F(X + Y ) ≤ F(X) + F(Y )
(7.4)
112
for any two independent random variables X and Y .
Définition 7.4. A functional F of the distribution of a random variable X, denoted by F(X), is said to be σ-sub-additive if
F σ (X + Y ) ≤ F σ (X) + F σ (Y )
(7.5)
Theorem 7.1. Let suppose that F is a σ-sub-additive and scale equivariant functional and that the mixing matrix A is orthogonal. Let σ be a real number such
that σ ≥ 2. Then, the following objective function
C(B) = −
n
X
F σ (yi ) where y = Bx
(7.6)
i=1
is a contrast function for blind separation of linear instantaneous mixtures under
the orthogonality constraint of the mixing matrix B.
Proof
def
Let write C = BA. Then, letting Cij be the general element of C
and sj be the components of s, one has
yi =
m
X
Cij sj .
(7.7)
j=1
Hence, using the equivariant and the sub-additivity properties of F
we have

 
σ
m
X
X
Fσ 
Cij sj  ≤ 
|Cij |F(sj ) .
(7.8)
j
j=1
³P
m
´ ³P
´
P
m P F(sj )
Write m
|C
|F(s
)
=
F(s
)
|C
|
m
j
ij
j
ij , and
j=1
j=1
j=1
j=1 F(sj )
σ
use the convexity property of the function x 7→ x for a real number
σ ≥ 2, then we have :


X
Cij sj 
F σ (yi ) = F σ 
(7.9)
j

σ
m
m
X
X
F(s )


Pm j
|Cij |σ
≤
F(sj )
F(sj )
j=1
j=1
j=1
σ−1

m
m
X
X
F(sj )
|Cij |σ F(sj )
≤ 
j=1
≤ Υ
m
X
(7.10)
(7.11)
j=1
|Cij |2 F(sj ).
(7.12)
j=1
7.4 Sub-Additivity based Contrast Functions
where Υ is the constant quantity
³P
m
´σ−1
Summing the
P
above inequalities and use the orthogonality constraint |Cij |2 ≤ ni=1 |Cij |2 =
1, one gets
Ãm
!
n
n
m
X
X
X
X
σ
2
F (yi ) ≤ Υ
|Cij | F(sj ) = Υ
F(sj )
(7.13)
i=1
j=1
j=1 F(sj )
113
i=1
.
j=1
Clearly the equality is attained P
if C is a generalized2 permutation
matrix. This proves that C(B) = ni=1 F σ (yi ) is a contrast function.
¥
Thus one needs only to find a sub-additive scale equivariant functional F.
7.4.1
Lp -norm contrast functions ; p ≥ 1
R
def
Let F(Y ) = kY kp = (IE|Y |p )1/p = ( |y|p fy (y)dy)1/p , the Lp norm of the
random variable Y , where fy (.) denote the density function of Y .
Note that by the second and third axioms in a norm definition, F(Y ) = kY kp is
sub-additive and scale equivariant. Thus, the Lp -norm criterion
Cp (B) =
n
X
kyi kσp
with σ ≥ 2
(7.14)
i=1
is a contrast function that can separate sub- and super- Gaussian sources. Indeed,
it is worth to emphasis that the existing of fractional lower-order moments imply
that the Lp -norm contrast function can separate heavy-tailed α-stable signals by
choosing 1 ≤ p < α. For examples one can choose σ = 2p.
7.4.2
Alpha-stable scale contrast function
Let us consider a mixture of α-stable sources with the same characteristic
exponent α and dispersion γ. The scale parameter of an alpha-stable distribution
1
1
1
is defined by S = γ α . We recall that S(aY ) = γ α (aY ) = aγ α (Y ) = aS(Y ) .
def
1
Then, the α-stable scale functional F(Y ) = γ α is equivariant. Note that the scale
functional is also sub-additive. To prove this, let us consider two independent r.v
X and Y , then we have
1
S(X + Y ) = (γX+Y ) α
= (γX + γY )
1
α
(7.15)
1
α
(7.16)
1
α
≤ (γX ) + (γY ) .
(7.17)
The last inequality result for α ≥ 1 follows the fact that for non negative numbers
u, v, r, with r ≤ 1 one has (u + v)r ≤ ur + v r , because
Z u+v
Z v
Z v
(u + v)r − ur =
rtr−1 dt =
r(t + u)r−1 dt ≤
rtr−1 dt = v r
u
0
0
2
We mean by generalized permutation matrix any matrix DP, where P is a permutation
matrix and D is a diagonal matrix.
114
From theorem 7.1 the sum of scale of all output BSS model
C(B) =
n
X
Syσi
with σ ≥ 2
(7.18)
i=1
defines a contrast function. Thus, this is another contrast function that can separate linear alpha-stable mixtures. In the algorithm derivation step one can estimate
Syi using one of existing method to estimate the dispersion γyi .
Remark 7.2. The Lp -norm contrast function can separate a mixture of heavy
tailed and other non-heavy tailed sources. This robustness property follows from
the fact that the fractional lower-order statistics needed for empirical computation
are always defined for any r.v. with distribution not necessary alpha-stable unlike
the alpha-stable scale contrast function it is restricted to sources with alpha-stable
distributions.
7.5
Super-Additivity based Contrast Functions
Définition 7.5. A functional G of the distribution of a random variable X, denoted
by G(X), is said to be super-additive if
G(X + Y ) ≥ G(X) + G(Y )
(7.19)
Définition 7.6. A functional G of the distribution of a random variable X, denoted
by G(X), is said to be σ-super-additive if
G σ (X + Y ) ≥ G σ (X) + G σ (Y )
(7.20)
Theorem 7.2. Suppose that G is a σ-super-additive and scale equivariant functional, that the mixing matrix A is orthogonal and that σ is a real number such
that σ < 2. Then, the following objective functions
C(B) = −
n
X
G σ (yi ) where y = Bx
(7.21)
i=1
are contrast functions for blind separation of linear instantaneous mixtures under
the orthogonality constraint of the mixing matrix B.
Proof
– To prove the first contrast function requirement, remember that
with the same notations as in the proof of theorem 7.1, one has yi =
7.5 Super-Additivity based Contrast Functions
115
Pm
j=1 Cij sj . Using the σ-super-additivity and the scale equivariant
properties of the functional G, we have
σ
−G (yi ) ≤ −
≤ −
m
X
j=1
m
X
G σ (Cij sj )
(7.22)
|Cij |σ G σ (sj )
(7.23)
j=1
(7.24)
Summing this quantity for all output components
P and using the fact
that |Cij |σ ≥ |Cij |2 since σ < 2 and |Cij |2 ≤ ni=1 |Cij |2 = 1 du to
the orthogonality constraint, one gets
−
n
X
σ
G (yi ) ≤ −
i=1
n X
m
X
|Cij |σ G σ (sj )
i=1 j=1
≤ −
= −
Ã n
m
X
X
j=1
m
X
(7.25)
!
2
|Cij |
G σ (sj )
(7.26)
i=1
G σ (sj )
(7.27)
j=1
So
C(y) = C(Cs) ≤ C(s)
(7.28)
Thus, the requirement R1 is fulfilled.
– Finally, C(Cs) = (s) or equivalently,
C(y) = −
n
X
i=1


m
n
X
X
Gσ 
Cij sj  = −
G σ (sj )
j=1
(7.29)
i=1
Pm
requires that ∀i,
j=1 Cij sj = sj . It implies that C:j has exactly
one nonzero component Cij j = ±1. Since C is orthogonal, it means
that C = DP, where D denotes a diagonal matrix with entries
±1, and P the permutation matrix associated to the permutation
i(1), · · · , i(n).
Clearly the equality in (7.28) is P
attained if C is a permutation matrix. This proves that C(B) = − ni=1 G σ (yi ) is a contrast function.
¥
7.5.1
Dispersion contrast function
Here, we consider a linear mixture of α-stable signals with the same characte1
ristic exponent α, dispersion γ and scale functional S = γ α considered above.
116
We verified above that S is scale equivariant. It is also easy to see that S is αadditive ; α-sub- and α-super- additive :
³ 1
´α
S α (X + Y ) = γ α (X + Y ) = γ(X) + γ(Y ) = S α (X) + S α (Y )
(7.30)
Then, from theorem 7.2 the objective function
C(B) =
n
X
S α (yi ) =
i=1
n
X
γyi
(7.31)
i=1
is a contrast function for alpha-stable source separation.
Thus we give another proof that the global minimum dispersion criterion, which
we introduced in [Sahmoudi et al.(2003a)] is a contrast function.
7.6
Jacobi-Gradient Algorithm for Prewhitened
BSS
As presented in the previous chapter, every orthogonal matrix can be parameterized in terms of Givens rotation angles, each of which defines a rotation in a
single plane of the high-dimensional vector space. Then, these individual rotations
can be cascaded to span the whole set of rotation matrices. Every rotation matrix
has a unique set of Givens rotation angles that characterize it. In n-dimensions, a
Givens rotation matrix in the plane formed by the i-th and j-th axes is denoted by
Ωij , and is given as was presented in [Sahmoudi et al.(2005)]. A rotation matrix is
then formed from these sparse matrices according to
B=
m−1
Y
m
Y
Ωpq
(7.32)
p=1 q=p+1
The multiplication order can be always from the left or from the right. It is not
crucial to the generality of this formula as long as we maintain the same order
when taking the derivative of the matrix with respect to a rotation angle.
[A]- Optimization & algorithm
Our aim is to solve the previously mentioned constrained optimization problem, which becomes unconstrained if Givens angles are used : Let θkl , k =
1, · · · , m − 1, l = 1, · · · , m be the Givens rotation angles that form up our parameter vector Θ.
To derive a nice and fast algorithm, we propose here to combine the Jacobi-like
decomposition of Givens rotations and the Gradient algorithm using a numerical
computation for searching θ. The so called Jacobi-Gradient algorithm can be summarized as follows :
117
$
'
Jacobi-Gradient Algorithm
Step 1. Initialize Givens angles randomly.
Step 2. Estimate robustly from data, especially if the sources are in
noisy environment, the considered sources statistics used in the contrast
function.
Step 3. Calculate the gradient of the cost function with respect to the
Givens angle. The gradient is ∂C(B)
∂θkl
Step 4. Update the Givens angles using gradient ascent
θ(k + 1) = θ(k) + η
∂C(B)
∂θ
Step 5. Go back to step 3 and continue until convergence.
&
%
[B]- Complexity
A key concern in many adaptive algorithms is the computational complexity.
It is clear that if the multiplications in (7.32) are performed from left, the first
output is only affected from the Givens angles with indices θ1q , q = 2, · · · , m,
the second is affected by all the angles θ1q , q = 1, · · · , m and θ2q , q = 3, · · · , m
and so on. Thus, if we wish to extract the m source component, we only need to
adapt the angles θij , i = 1, · · · , m, j = i + 1, · · · , m which makes a total of
m2 − m(m + 1)/2 parameters, that is less than m2 parameters required in many
Jacobi like algorithms. But then, we will have to evaluate either the sin and cos of
all these parameters once. In addition, the necessary matrix vector multiplications
in the algorithm will be performed at each iteration, which amount O(m2 ).
7.7
Concluding Remarks
In this chapter some robust contrast functions have been introduced. A practical contrast function was derived for application to heavy-tailed sources.
Coupling Jacobi and Gradient optimization techniques, a nice implementation
was proposed for prewhitened BSS methods.
We note that this work was developed recently and due to time limitations
we can not present any experimental result. We note that we plan to extend this
work using some result in robust statistics of sub- and super- additive functionals
of heavy-tailed random variables [Huber(1981)]. Performance analysis also will be
investigated.
118
Chapitre 8
Normalized HOS-based Approaches
This chapter introduces a new approach for the blind separation (BS) of heavy
tailed signals that can be modeled by real-valued symmetric α-stable (SαS) processes. As the second and higher order moments of the latter are infinite, we propose to use normalized statistics of the observation to achieve the BS of the sources.
More precisely, we show that the considered normalized statistics are convergent
(i.e., take finite values) and have the appropriate structure that allows for the use
of standard tensorial BS as well as non-linear decorrelation techniques based on
second and higher order cumulants.
8.1
Introduction
By the generalized central limit theorem, the α-stable laws are the only class
of distributions that can be the limiting distribution for sums of i.i.d. random variables [Samorodnitsky et Taqqu(1994)]. Therefore, many signals are impulsive in
nature or after certain pre-processing, e.g. using for example wavelet transform,
and can be modeled as stable processes [Nikias et Shao(1995)], [Cappé et al.(2002)].
Unlike most statistical models, the α-stable distributions except the Gaussian have
infinite second and higher order moments. Consequently, standard blind source separation (BSS) methods would be inadequate in this case as most of them are
based on second or higher order statistics [Cichocki et Amari(2002)].
In this chapter, we propose a new approach for the BS of heavy tailed sources
using normalized statistics (NS). It is first shown that suitably normalized secondand fourth-order cumulants exist and have the appropriate structure for the BSS.
This is a similar result to those of [Swami et Sadler(1998)] in the ARMA stable
context. Then, for extracting α-stable source signals from their observed mixtures
120
one can use any standard procedure based on second- or forth-order cumulants.
This BSS method has several advantages over the existing ones that are discussed
in the sequel. Simulation-based comparisons with the minimum dispersion (MD)
criterion based method in [Sahmoudi et al.(2005)] are also provided.
8.2
Normalized Statistics of Heavy-Tailed Mixtures
8.2.1
Normalized moments
Thanks to the algebraic tail-behavior, we demonstrate here that the ratio of
the k-th moments of two random SαS variables with α 6= 2 converges to a finite
value (even though the moments themselves are infinite). More precisely, we have
the following theorem :
Theorem 8.1. Let X1 and X2 be two SαS variables of dispersions γ1 and γ2 and
PDFs f1 (.) and f2 (.), respectively. Then, for k ≥ α, we have
RT
k
IE(|X1 |k ) 4
γ1
−T |x| f1 (x)dx
=
lim
=
R
T
k
k
T →∞
γ2
IE(|X2 | )
−T |u| f2 (u)du
(8.1)
proof
Let Rk represents the above ratio, then due to the symmetric PDF of
X1 and X2 , we have
RT
RT k
k
x f1 (x)dx
4
−T |x| f1 (x)dx
Rk = R T
= R0T
(8.2)
k
k
−T |u| f2 (u)du
0 u f2 (u)du
Using integration by parts, we get
Rk =
[−xk (1 − Φ1 (x))]T0 + k
[−uk (1 − Φ2 (u))]T0 +
RT
R0T
k 0
xk−1 (1 − Φ1 (x))dx
uk−1 (1 − Φ2 (u))du
(8.3)
where Φ(.) denotes the cumulative function of the considered PDF.
From the heavy tails property (see chapter 2), we can observe that
for any SαS cumulative function Φ, we have (1 − Φ(x)) ∼ C2α γx−α as
x → ∞. Then, as T → ∞, Rk is equivalent to :
RT
Cα γ1 [−xk−α ]T0 + k 0 xk−1−α dx
γ1
Rk ∼
→
R
T
Cα γ2 [−uk−α ]T + k
γ2
uk−1−α du
0
0
¥
Using a similar proof, one can demonstrate that the ratio of the square of the k-th
moment to the 2k-th moment of a random SαS variable (α 6= 2) converges to zero
for k > α. More precisely, we have the following theorem :
8.3 Normalized Tensorial BSS Methods
121
Theorem 8.2. Let X be a SαS variable of dispersion γ and PDF f (.). Then, for
k > α, we have :
RT
( −T |x|k f (x)dx)2
(IE|X|k )2 4
= lim R T
=0
(8.4)
2k
T →∞
IE|X|2k
−T |x| f (x)dx
8.2.2
Normalized second and fourth order cumulants
Using above results, we can establish now that the normalized covariance matrix of the mixture signal converges to a finite valued matrix with the desired
algebraic structure. We have the following result :
Theorem 8.3. Let x be an SαS vector given by x = As (s being a vector of
SαS independent random variables). Then the normalized covariance matrix of x
satisfies :
m
X
Cum[x(i), x(j)]
4
dk ak (i)ak (j)
=
R(i, j) = Pn
k=1 Cum[x(k), x(k)]
k=1
or equivalently :
R = ADAT where D = diag(d1 , · · · , dm ) and
γi
2
γ
j=1 j || aj ||
di = Pm
aj being the j-th column vector of A.
Similarly, the normalized quadri-covariance tensor [Cardoso(1991)] of the mixture signal converges to a finite valued tensor with the desired algebraic structure.
We have the following result :
Theorem 8.4. Let x be an SαS vector given by x = As (s being a vector of SαS
independent random variables). Then the normalized quadri-covariance tensor of
x satisfies :
Q(i, j, k, l)
4
=
=
Cum[x(i), x(j), x(k), x(l)]
Pn
r=1 Cum[x(r), x(r), x(r), x(r)]
m
X
κr ar (i)ar (j)ar (k)ar (l)
r=1
where
γi
4
j=1 γj || aj ||
κi = Pm
8.3
8.3.1
Normalized Tensorial BSS Methods
Separation algorithms
Thanks to theorems 8.3 and 8.4, we can now use existing BSS methods based
on 2nd and 4th order cumulants, e.g. [Comon(1994)] for the ICA algorithm and
122
[Cardoso et Souloumiac(1993)] for the JADE algorithm. In this work, we have
applied JADE to the normalized 2nd and 4th order cumulants of the observations.
The so called Robust-JADE1 algorithm can be summarized as follows in Table 8.1.
'
$
Robust-JADE Algorithm
Step 1. Compute a whitening matrix Ŵ from the normalized sample
covariance R̂x (that is estimated as the standard sample covariance matrix
divided by its trace value).
Step 2. Compute the most significant eigenpairs a {λ̂r , M̂r ; 1 ≤ r ≤ m}
from the normalized sample 4th-order cumulants of the whitened process
4
z(t) = Ŵx(t).
Step 3. Diagonalize jointly the set λ̂r M̂r ; 1 ≤ r ≤ m by a unitary matrix Û.
a
See [Cardoso et Souloumiac(1993)] for more details about the JADE algorithm
&
Tab. 8.1 – The principal steps of the proposed Robust-JADE algorithm.
We provide here some remarks about the above separation method and discuss
certain advantages of the use of normalized statistics.
– Based on theorem 8.2, the normalized 4-th order cumulants are equal to the
normalized 4-th order moments of the SαS source mixture (recall here that
for a real valued zero-mean random variable x, we have cum(x, x, x, x) =
IE(x4 ) − 3(IE(x2 ))2 ). In other words, for SαS sources, one can replace the
4-th order cumulants by the 4-th order moments of the mixture signal.
– One major advantage of the proposed method compared to the FLOM based methods is that no a priori knowledge or pre-estimation of source PDF
parameters (in particular, the characteristic exponent α) is required. Consequently, the normalized-statistics based method is robust to modelization
errors with respect to the source PDF.
– In the case where the sources are non-impulsive, the proposed method coincides with the standard one (in our case, with the JADE method). Indeed,
because of the scaling indeterminacy, the normalization would have no effect
in this case.
– Another advantage of the NS-based method is that it can easily be extended to the case where the sources are of different types : i.e., sources with
different characteristic exponents or non-impulsive sources in presence of
other impulsive ones. That can be done for example by using the above NSbased method in conjunction with a deflation technique [Adib et al.(2002)].
1
In the fact Robust-JADE the same algorithm JADE up to some multiplicative
constants which has any effect of BSS. We refer the resulting algorithm by Robust-JADE
to express it’s validity for heavy-tailed sources.
%
8.3 Normalized Tensorial BSS Methods
123
Indeed, in that case, one can prove that the normalized statistics coincide
with those of the mixture of the ‘most impulsive’ sources only (i.e the ones
with the smallest characteristic exponent) which can be estimated first then
removed (by deflation) to allow the estimation and separation of the other
sources. This point is still under investigation and will be presented in details in future work.
– In this paper, we have established only the convergence of the ‘exact’ normalized statistics (expressed by the mathematical expectation). In fact, one
can prove along the same lines of [Swami et Sadler(1998)] that the sample
estimates of the second and fourth order cumulants converge in probability
to the exact normalized statistics given by theorems 8.3 and 8.4.
8.3.2
Performance evaluation & comparison
This section examines the statistical performances of the separation procedure.
The numerical results presented below have been obtained in the following setting. The source signals are i.i.d. impulsive symmetric standard α-stable (β = 0,
µ = 0 and γ = 1).
The number of sources is m = 3 and the number of observations is n = 4.
The statistics are evaluated over 100 Monte-Carlo runs and the mixing matrix is
generated randomly at each run.
• Performance index
To measure the quality of source separation, we did use the generalized rejection level criterion defined as follows : If source k is the desired signal, the related
generalized rejection level would be :
P
P
α
def γ( l6=k Ckl sl )
l6=k |Ckl | γl
=
(8.5)
Ik =
γ(Ckk sk )
|Ckk |α γk
where γ(x) (resp. γl ) denotes the dispersion of an SαS random variable x (resp.
def
source sl ) and C = Â# A . Therefore, the averaged rejection level is given by
m
Iperf =
m
1 X X | Cij |α γj
1 X
Ii =
.
m
m
| Cii |α γi
i=1
i=1 j6=i
The performances of the NS-based method (referred to as Robust-JADE) are compared with those of the MD method introduced in [Sahmoudi et al.(2005)].
• First experiment
Figure 8.1, present the generalized mean rejection level versus the additive
Gaussian noise power (N = 1000 and α = 1.5). We can observe also a certain
performance gain in favor of the minimum dispersion (MD) algorithm.
124
0
Mean rejection level
−5
−10
Robust−JADE
−15
−20
MD
−25
−30
−25
−20
−15
Noise power in dB
−10
−5
0
Fig. 8.1: Generalized mean rejection level versus the noise power.
• Second experiment
Figure 8.2, present the generalized mean rejection level versus the sample size
(α = 1.5 and the mixture is noise-free).
We can observe a certain performance gain in favor of the Robust-JADE algorithm.
−10
−12
Mean rejection level in dB
−14
−16
−18
−20
−22
−24
MD
Robust−JADE
−26
−28
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
N: sample size
Fig. 8.2: Generalized mean rejection level versus the sample size.
8.4 Normalized Non-linear Decorrelation BSS Methods
8.4
125
Normalized Non-linear Decorrelation BSS Methods
In this chapter, we focus on the use of normalized statistics (NS) of heavytailed sources for the BSS problem. In [Sahmoudi et al.(2004a)], the NS have been
introduced for alpha-stable sources to justify for the use of algebraic-based separation algorithms (JADE, SOBI,..etc) to achieve the BSS in the heavy-tailed case.
In this work, we propose to use the NS to robustify the class of Non-linear decorrelation algorithms like as the EASI algorithm [Cardoso et Lahed(1996)]. Algorithm
derivation, discussion and simulation results are provided to illustrate the usefulness of NS in that context. The new method has been compared with two of the
most popular BSS algorithms ; EASI and Quasi Maximum-Likelihood algorithm
[Pham et Garrat(1997)].
To deal with the particular BSS problem of heavy-tailed data, we proposed in
[Sahmoudi et al.(2004a)] to use normalized second and higher order statistics that
are shown to be of finite values and have the appropriate structure based on
which the BSS can be achieved. In this section, we propose to use the normalized statistics to robustify and adapt the class of BSS algorithms using composite (second and higher order) criterion to the impulsive source case. Particularly,
the EASI (equivariant adaptive separation by independence) algorithm proposed
by Cardoso and Lahed in [Cardoso et Lahed(1996)] and its batch version IBSS
[Belouchrani et al.(1997b)] has attracted a lot of interest in the Independent Component Analysis community. However, we show by both analytical studies and computer experiments that this algorithm fails to perform separation of heavy-tailed
sources such as alpha-stable signals, and divergence behaviors may be observed.
We introduce then a robust-EASI criterion based on the normalized statistics and
that is shown to be effective for the BSS problem in the considered context. Algorithmic details and simulations results related to the iterative implementation,
referred to as Robust-EASI algorithm, are provided and discussed in this section.
A broad and increasingly important class of non-Gaussian phenomena encountered
in practice can be characterized as impulsive [Adler et al.(1998)]. It is for this type
of signals and noise that the heavy-tailed distributions provide a useful theoretical
tool. In this section we use the heavy-tailed behavior characterization of α-stable
distributions to achieve the normalization of high-order statistics of the considered
linear mixture. Let us recall this last property :
Property 8.1. : Heavy-tailed asymptotic behavior
Let X ∼ SαS an α-stable r.v. with α < 2. Then :
P (X > x) ∼ γCα x−α
as x → ∞
where Cα is a positive constant depending only on α.
Thus, α-stable distributions have inverse power, i.e. algebraic tails. In contrast,
the Gaussian distribution has exponential tails. This proves that the tails of stable
laws are much thicker than those of the Gaussian distribution. And the smaller
the value of α is, the thicker the tails. An important consequence of property 8.1
126
is the no-existence of the second and higher order moments of stable distributions,
except for the special case α = 2. However, thanks to the heavy-tailed behavior of
the sources s, the normalized covariance and fourth-order cumulants exist for x =
As. More precisely, we have established in [Sahmoudi et al.(2004a)] the following
result :
Theorem 8.5. : Normalized statistics of heavy-tailed mixtures
1. Let X be a heavy-tail distributed r.v. with index α. Then, for k > α, we have :
k )2
ˆ
(IE|X|
=0
2k
ˆ
T →∞ IE|X|
lim
P
ˆ denotes the time averaging operator IE[g(X)]
ˆ
where IE
= T1 Tt=1 g[X(t)].
2. Let R̂ be the sample covariance matrix of the mixture signal in (9.6) :
T
1X
R̂ =
x(t)x(t)∗
T
def
t=1
R̂
Then, the normalized sample covariance matrix
converges when T → ∞
T race(R̂)
∗
to a finite valued matrix of the form ADA with D is a positive diagonal matrix.
ˆ
3. Let Cum[x(i),
x(j), x(k), x(l)] be the sample quadri covariance tensor. Then
the normalized sample quadri covariance tensor
ˆ
Cum[x(i),
x(j), x(k), x(l)]
Pn
ˆ
Cum[x(r),
x(r), x(r), x(r)]
r=1
of the mixture signal converges to a finite valued tensor.
The consequence of this result are that many of the algorithms for source
separation can be modified to be applicable to heavy-tailed sources.
8.4.1
Robust composite criterion for source separation
[A]- EASI family criterion
It was shown in [Cardoso et Lahed(1996), Belouchrani et al.(1997b)] that a
general composite criterion for blind source separation can be defined as
½
Cg (B) = IE{zz∗ − I + [g(z)z∗ − zg(z)∗ ]};
(8.6)
with z = Bx
IE denotes the mathematical expectation, I is the identity matrix and z∗ denotes
the (conjugate) transpose of the (complex) vector z. The non-linear function g is
chosen such that, if z is a random vector with i.i.d. components, then Cg (B) is the
null matrix in the noiseless case :
B is a separating matrix ⇒ Cg (B) = 0
(8.7)
where g is usually chosen as an odd non-linear function that preserves the phase
of its argument, i.e. the i-th coordinate of g(z) is of the form gi (z) = gi (zi ) =
fi (|zi |2 )zi where fi is a real function.
127
[B]- Robust-EASI family criterion
In the case of heavy-tailed sources such as alpha-stable signals which have infinite moments of order equal or greater than two, the criterion Cg (B) is inadequate
and divergence behaviors may be observed, especially if the non-linearities in g are
strongly increasing functions (like a cubic distortion for instance). For simplicity
let us choose g(zi ) = |zi |2 zi .
• Note, for example, that even when B equals A−1 and we have z = s, the
right hand of (8.7), Cg (B), does not converge to 0 since IE{|s2i |} and IE{|s4i |} are
infinite, which undermines the validity of this separation procedure.
• In practice, one can always agree that for a finite sample size that sample
estimate of Cg (B) is of finite value. However, for impulsive signals and large sample
sizes, the second order term IE{zz∗ } − I in Cg (B) will be negligible compared to
the higher order term IE{g(z)z∗ − zg(z)∗ } (see 1 of theorem 8.5). In that case, the
whitening will not be performed correctly and thus the algorithm fails to converge
to the optimal solution.
To mitigate this difficulty, we propose to modify criterion (8.6) to ensure the
convergence of the two terms
IE{zz∗ } − I
and
IE{g(z)z∗ − zg(z)∗ }
For that, we use the concept of normalized statistics. Hence, in this section we
propose a robustified version of EASI approach obtained by modifying Cg (B) into :
h
i
(
∗
∗
zz∗
Pm −zg(z)
Cg (B) = IE T race(zz
+ g(z)z
∗) − I
4
j=1 |zj |
(8.8)
with z = Bx
resulting in the so-called Robust-EASI family of source separation algorithms. This modification preserves the structure of the standard EASI and IBSS
algorithms such that the term
IE{
zz∗
− I}
T race(zz∗ )
in (8.8) has the effect of driving the diagonal elements of C = AB to all ones.
Meanwhile, the other term
IE{
g(z)z∗ − zg(z)∗
Pm
}
4
j=1 |zj |
in (8.8) drives the off-diagonal elements of C to zeros.
8.4.2
Iterative quasi-Newton implementation
To solve (8.8), we propose to use a block technique based on the processing of
T received sample and consists of searching the zeros of Cˆg (B), which is the sample
128
version of Cg (B) :
4
ˆ
C(B)
=
1
T
T nh
X
t=1
∗
Pz(t)z(t)
m
2
j=1 |zj (t)|
i h
io
∗
∗
Pm −z(t)g(z(t))
− I + g(z(t))z(t)
4
|zj (t)|
j=1
(8.9)
with z = Bx
An approximate solution of Cˆg (B) = 0 may be obtained by the Newton technique :
ˆ
Cg (B) is replaced by its first order approximation around some B so that the resulting linear equation can be solved exactly ; solutions are then obtained iteratively
in the form Bp+1 = (I + Ep )Bp . At step p, a matrix Ep is determined from a local
linearization of Cˆg (Bp+1 ). The benefit that leads to explicit expression of Ep under
the additional assumption that Bp is close to a separating matrix. This iterative
implementation is summarized by the following algorithm so called Robust-EASI
algorithm in Table 8.2.
$
'
Robust-EASI Algorithm
Step 1. Initialization : chose B0 randomly and set
z(t) = B0 x(t), t = 1, · · · , T
Step 2. Computation of matrix E :
Eij =
with
ρ̂ii κ̂ij + ζ̂ji∗ (δ̂ij − ρ̂ij )
ρ̂ii ζ̂ij + ζ̂ji∗ ρ̂jj
,
i, j = 1, · · · , m
(8.10)
∗
ˆ Pmzi zj 2 }
ρ̂ij = IE{
|zj |
j=1
∗
2
2
(|zj | )−fi (|zi | )]
ˆ zi zj [fjP
κ̂ij = IE{
}
m
4
j=1 |zj |
2 [f 0 (|z |2 )|z |2 +f (|z |2 )−f (|z |2 )]
|z
|
j
i
i
i
i
j
j
i
ˆ
Pm
ζ̂ij = IE{
}
4
j=1
|zj |
Step 3. Update the estimated source signals :
z(t) ←− (I + E)z(t), for t = 1, · · · , T
Step 4. Check for convergence : if ||E|| < ² Stop
(² is a small threshold ), otherwise go back to step 2.
&
%
Tab. 8.2 – The principal steps of the proposed Robust-EASI algorithm.
8.4.3
In this section we compare our NS-based Robust-EASI method to two widely used BSS algorithms, EASI and RQML [Shereshevski et al.(2001)]. Recall
129
that RQML is the restricted quasi-maximum likelihood approach introduced as
an extension of the popular Pham’s quasi-maximum likelihood (QML) approach
[Pham et Garrat(1997)] to the α-stable sources case. Two simulation examples
with different types of distributions and a variety of sample size and noise power
are presented. All simulation results are averaged over 200 Monte-Carlo runs and
the mixing matrix A is generated randomly at each run. To measure the quality
of separation, we did use the generalized rejection level criterion defined as follows
[Sahmoudi et al.(2004a)] :
m
Iperf =
m
1 X
1 X X | Cij |α γj
Ii =
.
m
m
| Cii |α γi
i=1
(8.11)
i=1 j6=i
def
where γl denotes the dispersion of source sl and C = BA.
[A]- Experiment 1 : Alpha-Stable Mixture
In this experiment, mixtures of three heavy-tailed symmetric standard α-stable
(µ = 0 and γ = 1) signals with characteristic exponent α = 1.5 are considered.
The number of observations is n = 3 and the mixture is noise-free.
Figure 8.3 presents the generalized mean rejection level (8.11) versus the sample
size of each algorithm.
Alpha−stable Mixture
0
−5
Iperf rejection level in dB
Robust−EASI
EASI
RQML
−10
−15
−20
−25
−30
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Sample size
Fig. 8.3: Generalized mean rejection level versus the sample size.
130
From this results, we can observe that EASI has failed to separate α-stable mixtures. It can be seen that the new proposed Robust-EASI can separate correctly
the α-stable mixtures even for short sample sizes and outperforms the RQML
method.
[B]- Experiment 2 : Generalized Gaussian Mixture
The generalized Gaussian distribution has a density proportional to exp(−|x|p ),
p > 0. A p less than 2 gives a distribution suitable as an impulsive signal model. By
inferring p a wide class of probability distributions can be characterized including
uniform, Gaussian, Laplacian and other sub- and super- Gaussian densities. In this
experiment the three sources are impulsive with generalized Gaussian distribution
according to p = 1.5. Three mixtures corrupted by additive white Gaussian noise
are considered. We characterize the performance of each algorithm in terms of
def
signal rejection level. When C = BA, the i-th estimated source is :
sˆi (t) = zi (t) =
m
X
Cij sj (t)
j=1
which contains the j-th source signal at level |Cij |2 /|Cii |2 . Then, in this case the
averaged rejection level is given by
m
Iperf =
m
1 X
1 X X | Cij |2
Ii =
.
m
m
| Cii |2
i=1
(8.12)
i=1 j6=i
Figure 8.4 presents the mean rejection level (8.12) versus the noise power of each
algorithm.
Eventhough the impulsive sources are not heavy-tailed, the Robust-EASI algorithm still outperform largely the two other algorithm. That illustrates the fact
that the proposed approach is quite general and can be applied to a larger class
of source signal distribution including of course the heavy-tailed one.
In overall comparison, Robust-EASI method has reliable performance in all
considered situations whereas the other methods may fail, in particular if the
underlying assumptions on sources are not completely valid.
131
Generalized Gaussian Mixture
0
−2
RQML
−4
−6
EASI
Iperf Mean rejection level
−8
−10
−12
Robust−EASI
−14
−16
−18
−20
−30
−25
−20
−15
Noise power in dB
−10
−5
0
Fig. 8.4: Mean rejection level versus the noise power with T = 1000.
8.5
Concluding Remarks
In this chapter, two new NS-based blind separation methods for impulsive
source signals with heavy-tailed distributions are introduced :
• Robust-JADE algorithm : The normalized 2-nd and 4-th order cumulants of
the mixture signal are shown to be convergent to finite-valued matrices with
the appropriate algebraic structure that is traditionally used in many 2-nd
and higher order statistics based BSS methods. The advantages of the proposed method Robust-JADE are discussed and a simulation based comparison
with the MD method is provided to illustrate and assess its performances.
• Robust-EASI algorithm : In this chapter, we have proposed another approach
using normalized statistics for heavy-tailed mixing to improve the robustness of the EASI family algorithms. A normalized criterion was derived and
used for heavy-tailed source separation. The latter is solved using an efficient quasi-Newton iterative algorithm. Comparative simulations have been
provided to illustrate the effectiveness of the Robust-EASI algorithm. More
studies need to be done on the optimal non-linear function g choice for heavytailed sources. Note that the same methodology used here can be used to
derive a normalized version of other existing non-linear decorrelation criterion that can separate correctly heavy-tailed independent components.
132
Chapitre 9
A Semi-Parametric ML Approach
In this chapter, we propose a method for estimating in a semi-parametric way
the density of the missing data in the blind source separation problem. We consider
a log-spline model of a fixed size and maximum likelihood estimator of this density
in the linear BSS problem. So we get a log-spline density estimator, which can
be approached using a stochastic version of the expectation-maximization (EM)
algorithm coupled with the MCMC method.
9.1
The Likelihood of the BSS Model
A very popular approach for estimating independent sources is the maximum
likelihood (ML) method. A short introduction was provided in chapter 5. In this
section, we show how to apply ML estimation to BSS.
9.1.1
Derivation of the likelihood
It is not difficult to derive the likelihood of the observation vector x in the noisefree BSS model. This likelihood is based on the well-known result on the density of
a linear transform [Papoulis(1991)]. According to this formula, the density function
px of the mixture vector x = As can be formulated as
Y
px (x) = |det(B)|ps (s) = |det(B)|
pi (si )
(9.1)
i
A−1
where B =
and the pi denote the densities of the independent components.
This can be expressed as a function of B = (b1 , · · · , bn )T and x, as follows :
Y
px (x) = |det(B)|
pi (bTi x)
(9.2)
i
134
A Semi-Parametric Maximum Likelihood Approach
Assume that we have T observations of x, denoted by x(1), · · · , x(T ). Then the
likelihood can be obtained as the product of this density evaluated at the T points :
L(B) =
T
Y
|det(B)|
t=1
n
Y
pi (bTi x(t))
(9.3)
i=1
very often it is more practical to use the logarithm of the likelihood, since it is
algebraically simpler. This does not make any difference here since the maximum
of the logarithm is obtained at the same point as the maximum of the likelihood.
The log-likelihood is given by
log L(B) =
T X
n
X
pi (bTi x(t)) + T log |det(B)|
(9.4)
t=1 i=1
9.1.2
Sources density estimation
In this work, we propose a new procedure based on the maximum likelihood approach to estimate the mixing matrix. However, there is another thing to estimate
in the BSS model, though. This is the density of the independent components. This
make the problem much more complicated, because the estimation of densities is,
in general, a non-parametric problem. Non-parametric means that it cannot be
reduced to the estimation of a finite parameter set. In fact the number of parameters to be estimated is infinite, or very large. Thus the estimation of the BSS
model has also a non-parametric part, which explains why the proposed method is
called semi-parametric. Non-parametric pdf estimation is known to be a difficult
problem. This is why we would like to avoid the non-parametric density estimation
in the BSS. There are two ways to avoid it :
• Parametric : First, in some cases we might know the densities of the independent components in advance, using some prior knowledge on the data at hand.
Then the likelihood would really be a function of the mixing matrix only. If reasonable small errors in the specification of these prior densities have little influence
on the estimator, this procedure will give reasonable results. In fact, it will be
shown below, by computer simulation, that this is the case in impulsive environment using the alpha-stable distributions.
• Semi-parametric : A second way to solve the problem of density estimation is
to approximate the densities of the independent components by a family of densities that are specified by a limited number of parameters. If it is possible to use
a very simple family of densities to estimate the BSS model, we will get a simple
solution. Fortunately, this turns out to be the case using log-spline functions.
9.1.3
Optimization via the EM algorithm
If the likelihood of the observation can’t be maximized directly, it is possible to
do iterative maximization steps in order to approach the maximum. For example,
the expectation-maximization (EM) algorithm, proposed in [Dempster(1977)], is
a broadly applicable approach for iterative computation of maximum likelihood
9.2 Semi-Parametric Source Separation
135
estimates, useful in a variety of incomplete data (or partially observed data) statistical problems.
We recall here briefly the principle of the EM algorithm which is a two-step
iterative procedure. One iteration is composed by E-step and M-step. The E-step
computes
Q(θ | θk ) = IE {log f (x, s; θ) | x; θk }
and the M-step determines θk+1 as maximizing Q(θ | θk ).
Stochastic versions of EM have been introduced from different perspectives to
deal with situations where the E-step is infeasible in closed form. This is often the
case, because the expectation has no analytical form. In this case, Monte Carlo
Markov Chain (MCMC) replace the E-step by a Monte Carlo approximation of
the expectation based on a large number of independent simulations of the missing
data [Meng et Rubin(1993)]. An other way to get round the difficulty of the computation of the expectation is also to use simulation, not for approximating some
integral deriving from an expectation, but for getting some numerical plausible
value standing for the missing data. One proposition was to simulate the missing
data from the a posteriori distribution with the current values of the parameters
[Diebolt et Celeux(1993)]. To be more efficient, the SAEM algorithm, proposed in
[Delyon et al.(1999)], replaces the E-step not only by the simulation of the missing
data, but by stochastic approximation involving these simulated data. At iteration
k, SAEM generate m(k) realizations sk (j) (1 ≤ j ≤ m(k)) from the a posteriori
distribution denoted p(s | x; θk ) and updates Qk−1 (θ) according to



 1 m(k)
X
log f (x, sk (j); θ) − Qk−1 (θ)
(9.5)
Qk (θ) = Qk−1 (θ) + γk

 m(k)
j=1
where γk is a sequence of positive step-sizes decreasing to 0. Anyway, the use of
simulated data for estimating parameters in missing data statistical problems is a
powerful approach that tends to become popular since the 90’s. In this work we
use this procedure for the blind source separation problem.
9.2
9.2.1
Semi-Parametric Source Separation
Noisy linear instantaneous mixtures.
In this chapter, we consider the classical noisy linear BSS model with instantaneous mixtures given by :
x(t) = As(t) + ε(t), t = 1 . . . T
(9.6)
where A is a n × m unknown full column rank mixing matrix. The sources
s1 (t), · · · , sm (t) are collected in a m × 1 vector denoted s(t) and are Q
assumed to be
i.i.d. signals : the joint distribution density π is factorized as π = m
j=1 πsj . The
noise vector ε(t) (independent with s(t)) has independent components ε1 (t), . . . , εn (t)
136
with zero mean and unknown variance σ 2 . The goal of a BSS method is to find a
separating matrix i.e. an m × n matrix B such that the recovered sources Bx(t)
are as independent as possible. In the noiseless case, (9.6) admits a unique solution up to scaling and permutation indeterminacy y(t) = Bx(t) such that
4
C = BA = PΛ, where Λ is a diagonal scaling matrix and P is a permutation matrix (see [Hyvarinen et al.(2001)]). At most one source is allowed to be
Gaussian to ensure the identifiability. Another problem is that if one or more
sources do not have finite second or higher moments (e.g. heavy-tailed distributions) then prewhitening or criteria optimization would cause a breakdown
[Chen et Bickel(2004), Sahmoudi et al.(2004a), Sahmoudi et al.(2005)].
9.2.2
The proposed approach.
Our purpose is to estimate by maximum likelihood the density π, the mixing
matrix A and the noise-variance σ 2 . In this section, we present a semi-parametric
method to BSS using maximum likelihood estimation in a log-spline model in order to avoid any assumption of the source distribution. Nevertheless we suppose
that all sources are independent and have the same common distribution. We use
log-spline models for two reasons : on one hand, they have good functional approximation properties, on the other hand, they are well-adapted to the implementation
of the SAEM algorithm [Kuhn et Lavielle(2004)] allowing to compute easily our
estimator. Moreover, this estimation technique is inherently robust towards outliers and impulsiveness effects. For this reason, we apply this method to impulsive
random variables with possibly heavy-tailed distributions characterized by infinite
second and higher order moments.
Any BSS problem can be seen as an usual missing data problem. Indeed,
the observed data are the observations {x(t)}1≤t≤T , whereas the random sources
{s(t)}1≤t≤T are the unobserved data. Then, the complete data of the model is
{x(t), s(t)}1≤t≤T . We suppose that the unobserved sources are related to the ob1
servations through the density functions
Qm h of x conditionally to s . Our purpose is
to estimate the sources density π = j=1 πsj , the mixing matrix A and the noisevariance σ 2 . For that we propose a semiparametric approach which consists of
combining the logspline model for sources density approximation with a stochastic
version of the EM algorithm. We use logspline models for two reasons : on one hand,
they have good functional approximation properties, on the other hand, they are
well-adapted to the implementation of the SAEM (Stochastic Approximation version of the Expectation Maximization) algorithm [Kuhn et Lavielle(2004)] allowing
to compute easily our estimator. Indeed, the first assumption of the used SAEM
algorithm is equivalent to suppose that the complete data likelihood f (x, s, η)
belongs to the curved exponential family and can be written :
n
o
f (x, s, η) = exp −Ψ(η) + hS̃(x, s), Φ(η)i
(9.7)
1
The distribution of x conditionally to s, denoted by h, corresponds in the fact to the
distribution of the additive noise in the BSS model (9.6) with the same variance and a
non-zero mean value equal to As.
9.2 Semi-Parametric Source Separation
137
where h., .i denotes the scalar product, η denotes the unknown global parameters
vector to be estimated and S̃(x, s) is known as the minimal sufficient statistics
(MSS) of the complete model. In this case of unknown density functions following
model (9.7), a good approximation which satisfies this latter condition is given by
the logspline model. Moreover, it is was shown that this estimation technique is
inherently robust against outliers and impulsiveness effects [Takada(2001)]. For this
reason, we apply this method to impulsive random variables with possibly heavytailed distributions characterized by infinite second and higher order moments. We
define now precisely the logspline model which will be used.
9.2.3
Density estimation by B-spline approximations
In order to get a non parametric estimate of the source density function π,
we propose to use the logspline model. Let I be equal to [a, b] where −∞ < a <
b < +∞ and consider a given knots sequence τ = (tl )1≤l≤K+1 with a = t1 and
b = tK+1 . Consider now the space S q,τ of spline functions of positive order q on
I, namely piecewise polynomial functions of degree q − 1 associated to this knots
sequence. Then the dimension of S q,τ is equal to J = q + K − 1 and there exists a
B-splines basis denoted B1 , · · · , BJ for S q,τ [de Boor(1978)]. The logspline density
estimation method models a log-density function as a spline function :


J
X
∀s ∈ I,
πθ (s) = exp 
θj Bj (s) − c(θ)
(9.8)
j=1


 
Z
J
X
where c(θ) = log  exp 
θj Bj (s) ds
I
j=1
is a normalization factor and θ = (θ1 , . . . , θJ ) ∈ RJ . We choose the dimension
√
J of the logspline model in function of the sample size T such that J = o( T )
(see [Kuhn et Lavielle(2004)] for more details). We define now the observed loglikelihood corresponding to the logspline model of the observations defined as follow :
Z
T
1X
log
h (x(t)|s) πθ (s)ds
(9.9)
LT (θ) =
T
I
t=1
Then we consider the maximum likelihood estimator πθ̂T,J of the density π in the
logspline model given by :
θ̂T,J = arg max LT (θ)
θ∈ΘJ
(9.10)
This family is not identifiable since we have for all a real : c(θ + a) = c(θ) +
a implying that πθ+a = πθ . We set systematically θJ = 0 in order to get an
identifiable family of log-density functions and we denote ΘJ the subspace of RJ
composed of vectors having zero as last coordinate and Mq,τ the set of associated
densities, i.e. {πθ , θ ∈ ΘJ }. We describe briefly some properties of the B-splines
detailed in de Boor’s book [de Boor(1978)] :
138
– B-spline
For all 1 ≤ j ≤ J, the function Bj takes values in the interval [0, 1]. MoreoP
ver, we have Jj=1 Bj (s) = 1 ∀s ∈ I.
– Approximation property of the logspline model
We define δJ = inf θ∈ΘJ k log f − log πθ k∞ . For some positive continuous
density function f on I, δJ tends to zero when J goes to infinity.
See [de Boor(1978)] for more details on the links between the convergence rate
and the regularity of f . The particular properties of the logspline model let us
think that πθ̂T,J will have remarkable properties when T tend to infinity. In a first
time, we explain how we compute this estimator in practice simultaneously with
the mixing matrix and the noise variance.
9.2.4
The SAEM algorithm
To compute the unknown parameters η = (θ T , vec(A)T , σ 2 )T , we use the
SAEM algorithm coupled with a MCMC (Markov Chain Monte-Carlo) procedure
presented in [Kuhn et Lavielle(2004)]. Here we apply this algorithm for estimating
the mixing matrix A and the variance σ 2 using the logspline model to approach the
estimate πθ̂T,J . The complete log-likelihood corresponding to the logspline model
has the following expression :
Lcom
T (η) =
T
T
1X
1X
log h (x(t)|s(t)) +
log πθ (s(t))
T
T
t=1
(9.11)
t=1
So we apply the SAEM algorithm to this parametric model in order to approach
the estimator η̂ T,J of η, that maximizes the observed log-likelihood. To put out
the minimal sufficient statistics of the model, we write the developed expression
of the complete log-likelihood :
Lcom
T (η)
"
#
T
J
T
X
1X
1X
=
log h (x(t)|s(t)) +
θj
Bj (s(t)) − c(θ)
T i=1
T t=1
j=1
P
We choose as MSS S̃(x, s) = ( T1 Ti=1 Bj (si ), 1 ≤ j ≤ J) and we implement the
k-th iteration of the SAEM algorithm as :
• S-step : Generate a realization s0 using as proposal distribution the prior
distribution πθk and take sk equal to s0 or to sk−1 according to the value of
the acceptance probability.
• A-step : Update the minimal sufficient statistics S̃k according to the stochastic approximation :
³
´
S̃k = S̃k−1 + βk−1 S̃(x, sk ) − S̃k−1
(9.12)
where βk is a positive step-sizes sequence decreasing to 0.
• M-step : Update η k by maximizing the complete log-likelihood of the model evaluated in the observations and in the current value of
the minimal sufficient statistics.
9.3 Performance evaluation & comparison
139
This algorithm converges a.s. toward a local maximum of the log-likelihood of the
observations under very general regularity conditions (see [Kuhn et Lavielle(2004)]
for convergence results). In practice, the algorithm is easy to implement and has
a relatively low computational cost.
9.3
9.3.1
Some existing BSS methods
We briefly describe here three BSS approaches for comparison with the new
semi-parametric approach introduced above.
1) FastICA algorithm [Hyvarinen et al.(2001)].
Under the whitened zero-mean demixing model y = Wz, the FastICA algorithm
finds the extrema of a generic cost function IE{G(wT z}, where wT is one of the
rows of the demixing matrix W. The cost function can be e.g. a normalized cumulant or an approximation of the marginal entropy which is minimized in order to
find maximally nongaussian projections wT z. This algorithm is facing three problems. First, some sources may not have zero means in which case the mean values
must be explicitly included in the analysis. Second, in FastICA, the derivative of
the even function G is assumed to be an odd function. If this condition fails to
be satisfied, the FastICA as such may not work. Third, FastICA is not robust to
heavy-tailed effect.
2) JADE algorithm [Cardoso et Souloumiac(1993)].
This algorithm operates on cumulants as a measure of independence. It seeks to
approach independence through the maximizing of the higher order cumulants.
However, one major weakness of this algorithm is that higher order cumulants are
extremely vulnerable to outlier effects. Besides being sensitive to outliers, JADE
also fails to separate certain source distribution, i.e. skewed zero-kurtotic signals
generated by the power distribution. This is because by minimizing only the 4-th
order cumulants, third order effects like the skewness are ignored.
3) Minimum Dispersion (MD) algorithm [Sahmoudi et al.(2005)].
This approach is a two-step parametric algorithm for heavy-tailed source separation.
Step 1 : Robust whitening.
In the case of α-stable signals, it is proven in [Sahmoudi et al.(2004a)] that the norP
R̂x
malized covariance matrix of x defined by R̂nx =
with R̂x = T1 t x(t)x(t)T
T race(R̂x )
converges asymptotically (i.e. when T tends to infinity) to the finite matrix ADAT ,
where D is a positive diagonal matrix. Hence, the normalized covariance matrix
has the appropriate structure and the whitening problem becomes standard.
Step 2 : MD criterion.
Let z(t) = Bx(t) where B is an orthogonal separating matrix to be estimated and
x denotes the whitened data. It is shown in [Sahmoudi et al.(2004a)]
that under
Pm
orthogonality constraint, the MD criterion given by J(B) = i=1 γzi , where γzi
denotes the dispersion of zi (t) the i-th entry of z(t), is a contrast function.
140
The essential limitation of this method is that it can be used only for heavy-tailed
sources with α-stable distribution.
9.3.2
Parametric versus semi-parametric approaches
The MD method is said to be parametric in the sense that it relies on the
a priori knowledge of the exact source pdf. In this case, we have a finite set of
parameters to estimate. On the other hand, the SAEM method is said to be semiparametric in the sense that the source pdf is unknown and need to be jointly estimated with the desired parameters (i.e. mixing matrix) [P. Bickel(1998)]. Clearly,
estimating a pdf is a difficult problem as the number of parameters to be estimated is infinite. In the semi-parametric approach, we estimate a limited number of
parameters by replacing the estimation problem by an approximation one. The
parametric approach is preferred whenever a reliable a priori knowledge on the
source pdf is available. In the situations where the pdf is only partially or inaccurately known, semi-parametric methods should be used because of their robustness
against modelization errors as shown next by simulation results.
9.3.3
Computer simulation experiments
Here, we compare our proposed semi-parametric method SAEM to JADE,
FastICA and to the parametric MD algorithm. In all simulation experiments the
results are averaged over 100 iterations and the mixing matrix A is generated
randomly at each run. The stepsize sequence (βk ) used for SAEM was βk = 1/k.
For the choice of the size J of the logspline model in SAEM, we have tested some
values for J lower than 10 since we have at least 100 observations. The best estimation seems to be given for q = 4 and J = 5, so we will hold these values for
the following experiments. We choose as initial value θ0 , such that the logspline
density estimate is initialized with the uniform distribution on I = [−50, 50].
To measure the quality of separation we will use Amari’s error criterion as a
performance index (PI) defined as
PI =
m
X

m
X

i=1
j=1

Ãm
!
m
X
X
|Ci,j |
|C
|
i,j
− 1 +
−1
maxk |Ci,k |
maxk |Ck,j |
j=1
i=1
where C = (Ci,j )1≤i,j≤m = BA is the global system.
• Experiment 1 : Robustness against outliers.
First, we test the robustness against outliers. We mix two sources, one of Gaussian
distribution and the second of uniform distribution with randomly chosen mixing
matrices. The data set contains 1000 points. Without outliers, the performances
of SAEM, JADE and FastICA are all excellent (PI ≈ 0.05). To test for outlierrobustness, we replace 50 data point with outliers, i.e. uniformly distributed data
points within a disc of radius 500 around the origin (the norm of the original
data points is roughly within the range from 0 to 100). As expected, SAEM still
works fine. In fact, typically it does not even change its solution, because it simply
9.3 Performance evaluation & comparison
141
ignores the outliers in the B-spline adjustment stage. JADE and FastICA however,
produce arbitrary results because they employ higher-order statistics which are
highly sensitive to outliers.
• Experiment 2 : Asymptotic consistency.
Figure 9.1 shows some simulation results in case of noiseless three mixtures (n = 3
observations) of three sources (m = 3) with, respectively, a uniform distribution on
[0, 1], a Gaussian distribution with zero mean and unit variance and standard SαS
with α = 1.5. To detect whether BSS algorithms can obtain consistent estimates in
such situation, the sample size was increased from (1) : T = 1000 to (2) : T = 5000.
We compare SAEM and two other famous BSS algorithms, JADE and FastICA.
Similarly to [Chen et Bickel(2004)], we present the boxplots based on quartils to
assess the consistency of our method.
1
0.9
0.8
PI: Amari error
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
FastICA(1)
FastICA(2)
JADE(1)
JADE(2)
SAEM(1)
SAEM(2)
Fig. 9.1: Consistency of different BSS algorithms. The sample sizes were
1000 for case (1) and 5000 for case (2).
From the boxplots (Figure 9.1), we can see that as the sample size increases,
the estimation error (PI) for SAEM decreases more significantly toward zero than
for JADE and FastICA.
• Experiment 3 : Robustness against impulsive noise.
In this experiment we add impulsive noise to the above mixtures (considered in the
experiment 2) according to x(t) = As(t) + σ²(t) with ²(t) being a n-dimensional
Gaussian noise of unit variance. We track the evolution of the performance index as
a function of the noise level σ for kurtotic (super-Gaussian) noise : we used multidimensional Gaussian noise, where we change the absolute value to the power of
5.
142
0.7
FastICA
JADE
SAEM
0.6
PI: Amari error
0.5
0.4
0.3
0.2
0.1
0
0
0.05
0.1
0.15
0.2
0.25
Noise level
0.3
0.35
0.4
0.45
0.5
Fig. 9.2: The performance index versus noise level.
Figure 9.2 shows that JADE and FastICA start to fail at a certain noise level,
whereas SAEM continues to produce good BSS solutions. Note that we have chosen
the median over 100 runs because the PI depend strongly on the actual realization
of the noise.
• Experiment 4 : Robustness against error modelization.
Here, we consider m = 3 impulsive sources with generalized gaussian distribution
of parameter p = 1.5 (i.e. the source pdf is proportional to exp(−|x|p )). In that
case, the signals are of finite variances and n = 4 noise free mixtures are considered.
1.2
MD
SAEM
1
PI: Amari error
0.8
0.6
0.4
0.2
0.1
0.05
0
1000
1500
2000
2500
3000
Sample size
3500
4000
4500
5000
Fig. 9.3: The performance index versus sample size.
As can be observed from figure 9.3, the MD method fails to separate correctly
the sources as it relies on the SαS source pdf assumption that is not verified in
143
this example. This illustrate the robustness of the SAEM compared to the MD
method with respect to the pdf modelization errors.
9.4
Concluding Remarks
In this work, we developed a new semi-parametric BSS method using the SAEM
algorithm. The proposed method is applied for the blind separation of noisy linear
instantaneous mixtures of possibly heavy-tailed sources. The SAEM based method
is compared with the JADE, FastICA and the minimum dispersion (MD) methods
and shown to be more general (as it can be applied to a larger class of source signals
and in different scenario). The proposed SAEM algorithm outperforms JADE and
FastICA in terms of consistency and robustness against the outliers and impulsive
noise and outperforms the MD method in terms of robustness against modelization
errors.
144
Troisième partie
Séparation et Estimation des
Signaux FM Multicomposantes
dans un Environnement Impulsif
Dans le premier chapitre de cette partie (chapitre 10), nous
rappelons les grands principes et méthodes existants de
l’estimation des signaux FM non-stationnaires. Ensuite, nous
présentons nos approches novatrices en présence d’un bruit
additif de nature impulsive modélisé par une distribution
α-stable.
Chapitre 10
State of the Art
The two last decades in particular has witnessed a surge of interest in the
analysis of time-varying or non-stationary processes. The beginning of the 80’s
saw efforts in various parts of the world at developing spectral analysis techniques
which would overcome the drawbacks of classical spectral analysis [Grenier(1984)],
[Boashash(1991)]. These drawbacks arise largely due to the fact that the Fourier
transform signal characterization (upon which classical analysis is essentially based) assumes the spectral characteristic of both signal and noise are time-invariant.
When the important spectral features of the signal and or noise are time varying,
the effect of Fourier analysis is to produce an averaged (smeared) spectral representation. One of the consequences of this smearing is that there is a loss in frequency
resolution. One can try to reduce this smearing by obtaining the spectral estimates over short time intervals, so that the spectral components do not vary too
greatly within the window. However, the shortened observation windows produce
smearing of a different kind, this time due to the uncertainty relationships of time
and band limited signals. Research early in the 1980s focused on two directions :
modern parametric spectral analysis and time-frequency analysis.
In this chapter, we present a brief state-of-the-art of the nonstationary FM signals
analysis problem, which consists of spotlights on a few important existing methods.
10.1
Modern Spectral Analysis Approaches
Parametric modeling of nonstationary, signals has received a great deal of attention in the eighties [Grenier(1984)]. The approaches usually developed are the
representations of these signals by AR or ARMA models with time-varying coefficients. The coefficients are then approximated on a basis of known time-varying
functions, giving rise to a set of invariant parameters which are the coordinates of
148
State of the Art
the coefficients. This approach offers the advantage of leading to the same type of
identification procedures as for AR or ARMA models with constant parameters.
Another potential advantage of this kind of modeling is an improved accuracy of
parameter estimation methods applied to time-varying signals in comparison to
other estimation methods based upon the assumption that the signal is stationary
over a time interval. Several algorithms derived for stationary signals plus observation noise have been extended to the nonstationary case [Grenier(1984)]. However,
it appeared that in some cases the performance of the estimators was reduced in
the nonstationary case. For more details one can finde a good review of parametric
or modern spectral analysis methods in [Grenier(1984)] and [Boashash(1992a)].
10.2
Time-Frequency Analysis Approaches
The long research on the Winger-Ville Distribution (WVD), realized that it
was a means to attain good frequency localisation for rapidly time-varying signals
[Boashash(1992c)]. This interest was fuelled by the discovery that it had a number
of very attractive properties [Classen et Mecklenbrauker(1980)], as well as the evidence that the technique could be put to good practical use. The advance of digital
computers also aided its popularity, as the hitherto prohibitive task of computing
a two dimensional distribution came within practical reach. As the research in the
area continued, the importance of the WVD for random signal analysis become
apparent. In [Martin(1982)], the author showed that the WVD’s expected value
is simply the Fourier transform of the time-varying autocorrelation function. This
gave the WVD an important interpretation as a time varying Power Spectral Density (PSD), and sparked significant research efforts along this direction.
The WVD as an important time-varying filtering tool was also realised early. In
[Boudreaux-Bartels et Marks(1986)], a simple algorithm is derived which consiste
of masking (filtering) the input signal and then performed a least-squares inversion
of the WVD to cover the filtered signal. Many refinements, extensions and simplifications were developed to further this pioneering work on WVD based timevarying filtering. Detection and estimation were other research areas which saw
theoretical developments based on the WVD [Kay et Boudreaux-Bartels(1985)],
[Boashash et Rodriguez(1984)]. One of the crucial factors motivating such interest
was the fact that since the WVD is a unitary ’energy preserving) transform, many
of the classical detection and estimation problem solutions had alternate implementations based on the WVD. The time-frequency nature of the implementation,
however, allowed greater flexibility than did the classical ones.
Despite all the advances made in the theory and application of the WVD to
so many areas of signal processing, it was generally accepted that the WVD had a
number of limitations. One of the main limitations was considered to be the nonlinear nature of the WVD. The WVD performs a bi-linear transformation of the
frequency components of a signal, a fact which is significant for both deterministic and random signals. For deterministic multicomponent signals, the bi-linearity
causes ”cross-terms” or ”artefacts” to occur between the true frequency components. This can often render the WVD almost impossible to interpret visually.
10.2 Time-Frequency Analysis Approaches
149
For random signals, the bi-linear transformation exaggerates the effects of noise
by creating cross-terms between all noise and signal components. At low signal to
noise ratio (SNR), where the noise term of the bi-linear kernel dominates, this effect can contribute to a very rapid degradation of performance. A second drawback
which was attributed to the WVD is its inherent bias towards infinite duration signals. Since it is essentially the Fourier transform of a bilinear kernel, it is ”tuned”
to the presence of infinite duration complex sinusoid in the kernel and hence to
linear FM components in the signal itself. Practical signals are often highly localised in time, so that a simple Fourier transformation of the kernel, does not provide
a very effective analysis of the data.
Much came of the efforts to overcome these drawbacks. Cohen had already paved
the way for reducing the non-linear effects of the WVD by his work in quantum
mechanics, in which he proposed a generalized class of ”smoothed” Wigner distributions [Cohen(1966)]. He showed that an infinite number of joint distributions
with useful properties could be produced by performing a 2D smoothing function
of the Wigner distribution, the particular distribution depending on the smoothing function used. Researchers then turned to 2D smoothing functions to reduce
the artefacts, the most popular smoothing functions initially being the 2D Gaussian function. Further impetus to the attempted reduction of artefacts came with
the understanding that in the ambiguity function domain, the cross-terms tended to be distant from the origin, while the auto terms passed through the origin
[Flandrin(1998)]. This was especially helpful since the WVD was known to be related to the ambiguity function by 2D Fourier transformation [Boashash(1991)].
2D Fourier inversion of isolated regions of the ambiguity function was then used
to effect the cross-term reduction.
Subsequently, greater refinements and purpose entered the design procedure for
these TFDs. Choi and Williams used the ambiguity domain to design their variable level smoothing function, so that artefacts could be reduced to a greater or
lesser extent, depending on the application [Choi et Williams(1989)]. Zhao, Atlas
and Marks designed kernels in which the artefacts folded back onto the auto-terms
[Zhao et al.(1990)]. The latter effect was desirable, so as to be able to obtain visually satisfying representations. In [Kootsookos et al.(1992)], authors showed how
one could vary the shape of the cross-terms by appropriate kernel design. parallel
to the developments in smoothing of the WVD, another approach was used to
nullify the troublesome non-linear effects in the WVD. This approach was based
on the fact that the cross WVD (XWVD), although being closely related to the
WVD and having many of its desirable properties, is a linear distribution in the
observed signal. Efforts were made, then, to use the XWVD instead of the WVD,
wherever possible.
The problems relating to the WVD’s poor performance with short duration signals
were addressed in a number of different ways. Perhaps the first method proposed
was to modify the WVD by performing the spectral estimation of the kernel function with a Mellin transform [Marinovic(1984)]. Another method put forward for
better dealing with short duration signals was to use autoregressive spectral estimators of the kernel, which could be reliably be applied to short data sequences
[Boashash(1991)].
The emphasis on time-varying spectral analysis which occurred during the 1980s
150
State of the Art
also led very naturally to a heightened awareness of instantaneous frequency. For
analysts who were used to dealing with time-invariant systems, the simultaneous
use of the words instantaneous and frequency contained an element of contradiction. Frequency is usually assigned to the eigenvalues of the system’s eigenfunctions, and only defined for persistent processes. It becomes clear that a better
understanding of what was meant by ”instantaneous frequency” and how to estimate this important quantity was needed.
10.2.1
IF estimation using time-frequency methods
Not surprisingly, then, much work did focus on the concepts underlying the IF
and its relationship to TFDs [Boashash(1991)]. A summary of the developments
may be found in [Boashash(1992a), Boashash(1992b)]. Further work concentrated
on techniques for estimating the IF, with a number of useful new algorithms being
developed. Various techniques had been devised over the years for the estimation
of IF, but many of them were developed in the communications area, and as such,
were suited more to communications signals than to those encountered generally
in signal processing environments. Several IF estimation techniques have been developed recently to allow for a broader signal model, or for greater robustness to
noise. This is because the instantaneous frequency is one of the most important
features of any signal. There are two major approaches for IF estimation of FM
signals : parametric and non-parametric.
The non-parametric approach is based on using time-frequency distribution.
In summary, there are two major existing approaches for IF estimation using
TFD. The first is built on the first–order moment of TFD [Boashash(1991)].
The first–order moment of the WVD yields the IF [White et Boashash(1988),
Boashash(1991)], while others yield approximations of the IF [Boashash(1992c)].
However it fails to estimate multicomponent signals due to the presence of cross–
terms. The second approach is built on utilizing the fact that all TFD have peaks
around the IF laws of signals. The peaks of the WVD was used for IF estimation
and applied to many problems [Boashash(1992c)]. For better performance at lower
SNR, the XWVD was proposed [Boashash et O’Shea(1993)].
Other algorithms of TFD–based peak estimation can be found in, for examples,
[Boashash(1992c)], [Stankovic et Katkovnik(1998)], [Katkovnik et Stankovic(1998)],
[Luigi et Moreau(2002a)], [Luigi et Moreau(2002b)]. Like the first approach, this
approach also suffers from the presence of cross–terms in multicomponent signals
which results in poor estimation.
Upon the desired to design high resolution RID, BD was then proposed in
[Barkat(2000)], and MBD was developed in [Hussain(2002)], both with adaptive
algorithms for IF estimation of multicomponent signals.
10.2.2
Analysis of noisy multicomponent signals
There is a wide range of applications where we encounter signals comprised
of I components with different IF laws fi (t) and different envelopes ai (t), in adM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
10.3 Robust time-frequency analysis
151
ditive noise. It is often desired from such an observed signal, to determine the
IF law of each component. This can be achieved by representing the observed signal z(t) in time-frequency (t-f) domain and use time-frequency filtering methods
to recover the individual components [Cohen(1995)], [Boashash(1991)]. Another
approach involves extending parametric and non-parametric algorithms for IF estimation of monocomponent FM signals to the case of multicomponent signals
and design an algorithm that simultaneously tracks the various IF components
of the observed signal [Peleg et Friedlander(1996)], [Hussain et Boashash(2002)].
Both approaches require the use of time-frequency distributions (TFDs) with very
specific properties such as high time-frequency localization of the instantaneous
frequency components and high reduction of cross-terms interferences.
In practice, the signal under consideration may be subjected to additive noise. In
general, and for various reasons, the additive noise is assumed to be Gaussian.
The analysis of non-stationary signals affected by additive Gaussian noise has
been addressed in several places [Friedlander et Francos(1995)], [Barkat(2000)],
[Barbarossa et Scaglione(2000)], [Barkat et Abed-Meraim(2004)], [Hussain(2002)].
However, in some situations, the assumption about the Gaussianity of the noise is
not valid and, therefore, alternative techniques are needed in this case.
10.3
Robust time-frequency analysis
In the presence of impulsive heavy-tailed noise, which is well-modeled by the
family of alpha-stable distributions, time-frequency representation are severely corrupted by impulse-related artifacts, which tend to obscure the essential details of
the desired signal.
Recently, two novel techniques were proposed for the analysis of a monocomponent
FM signal contaminated by additive noise having unknown heavy-tailed distribution :
– First, a robust time-frequency distributions are developed as a generalization
of the robust minimax M-estimates. In [Katkovnik(1998)], a robust periodogram was proposed for the analysis of a single tone affected by additive heavy-tailed noise. In [Katkovnik et al.(2002)], the authors used the
so-called robust spectrogram and robust Winger-Ville distribution (WVD),
respectively, to address the problem of non-stationary signals embedded in
heavy-tailed noise. In [Barkat et Stankovic(2004)], the author extend the
work proposed in [Katkovnik et al.(2002)] to design a robust polynomial
WVD (PWVD). However, it is known that the spectrogram suffers from low
resolution in the time-frequency domain, that the WVD suffers from the presence of artifacts for non-linearly frequency modulated signals, and that the
PWVD suffer from the presence of cross-terms for multicomponent signals.
– second, in [Griffith(1997)] the author used the fractional-lower order covariance, which is a correlation measure that is well-behaved in alpha-stable
noise, have been developed a set of robust time-frequency representations
that offer significant improvements in performance over conventional quadratic time-frequency representations. However, the use of fractional-lower
order statistic in a time-frequency distribution has much expensive comM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des
152
State of the Art
plexity computation. In addition there is no consistent estimator of covariation and lower-order covariance in the literature to use it in practice.
In this third part of this thesis, we will propose two class of robust time-frequency
procedures to analysis multicomponent non-stationary signals in heavy-tailed noise
with α-stable model. The first one is based on a generalization of the work presented in [Barkat et Stankovic(2004)] to design robust time frequency distribution.
The second one use a preprocessing stage as a first step to mitigate the effect of
the impulsive noise before the time-frequency IF estimation step.
10.4
Concluding Remarks
In this chapter, we described various approaches to IF estimation of nonstationary signals. Because of the discussed limitations of the existing methods,
there was great interest in developing alternatives, especially for the multicomponent signals in impulsive noise environment.
Chapitre 11
Robust Parametric Approaches
In this chapter we address the problem of instantaneous frequency (IF) estimation of multicomponent nonstationary FM signals in impulsive α-stable noise
environment. Three parametric techniques are introduced using a two-step procedure. The first step consists of transforming the polynomial phase estimation problem into a frequency estimation one using a phase-polynomial transform (PPT).
In the second step, we perform the frequency estimation by three robust versions
of the MUSIC (MUltiple Signal Classification) algorithm using truncated data
(TRUNC-MUSIC), robust covariance estimate (ROCOV-MUSIC) and generalized
covariation coefficients matrix which entries are a fractional lower order statistics
of the signal (FLOS-MUSIC), respectively. We illustrate and compare the proposed
methods by simulations examples.
11.1
Introduction-Problem Statement
Many signals used in communications, radar, sonar, and other man-made signals, as well as various natural signals, involve phase modulation (FM) of a carrier. This model of FM signals was used in many references to define the notion
of a multicomponent signal [Peleg et Friedlander(1996)], [Barbarossa(1995)]. Such
complex signals can be affected by impulsive noise which can be modeled correctly
by α-stable processes. In this work, we parameterize the model of an FM signal
by assuming that the phase of the component is a polynomial function of time.
Remark 11.1. We note that the estimation approach proposed in this work can be
applied to multicomponent signals where the phase of some of the components is a
continuous, but not necessarily polynomial, function of time. Indeed, via Weierstrass theorem we can approximate any continus function by a polynomial one. More
154
about that can be found in [Peleg et Friedlander(1995)].
Without loss of generality, for a simple presentation we focus in this chapter on the case of a quadratic phase. If the considered signal is of phase order
higher to two, we can reduce the order of the signal by demodulation. If the estimate of the higher-order polynomial phase coefficient is accurate, the highest
order term is effectively removed, and we can proceed to use the same procedure
PPT/demodulation to estimate the next phase parameter. This procedure will be
repeated until all the coefficients have been estimated.
Then, the signal model in the quadratic phase case is given by
x(t) =
I
X
si (t) + z0 (t) =
i=1
I
X
ai (t) cos{φi (t)} + z0 (t)
(11.1)
i=1
where t = 0, . . . , N − 1, φi (t) = 2π(fi t + δi t2 ) + θi is the phase of the i-th so called
chirp component. The parameters fi , δi , i = 1, . . . , I are unknown real coefficients.
The values {θi , i = 1, · · · , I} are realizations of random variables, distributed uniformly and independently over [0, 2π). N is the sample size and I is the number
of components of the observed signal. The amplitudes ai (t) are assumed α-stable,
independent from the noise term z0 (t) and with location parameters ai 6= 0 and
dispersions γi .
The random noise z0 (t) is modeled as a symmetric α-stable process (SαS) with
zero location parameter.
Our primary interest is to estimate the instantaneous frequency IFi of each
signal component si , defined as
4
IFi (t) =
1 dφi (t)
= fi + 2δi t
2π dt
(11.2)
1
By decomposing ai (t) = γiα ai,0 (t) + ai where ai,0 (t) being a standard (zero location parameter and unit dispersion) α-stable process, we can re-write the signal
expression as
x(t) =
I
X
i=1
ai cos{φi (t)} +
I
X
1
γiα ai,0 (t)cos{φi (t)} + z0 (t)
i=1
|
{z
(11.3)
}
z(t)
=
I
X
ai cos{φi (t)} + z(t)
(11.4)
i=1
According to the stability property of α-stable laws [Nikias et Shao(1995)], z(t)
is an α-stable process . Thus, the problem of estimation (IFi )1≤i≤I of the multicomponent chirp signal affected by multiplicative and additive α-stable noise is
reduced to that of estimating (IFi )1≤i≤I of a constant amplitude chirp signals, i.e.
having the same IF laws as the original signals, but affected by the additive noise
only.
11.2 Polynomial-Phase Transform of FM Signals
11.2
155
Polynomial-Phase Transform of FM Signals
Consider the polynomial phase estimation of the signal x(t) in Eq.(11.1). One
possible solution to this problem is the maximum likelihood estimation algorithm.
However, this estimation algorithm requires a large amount of computation. Indeed, the lack of explicit expression of the α-stable noise PDF forces us to use some
existing approximation which turn out very expensive in numerical computation.
Therefore, we propose to use, a much simpler, procedure which is based on the
polynomial-phase transform (PPT) [Sahmoudi et al.(2003b)].
The PPT is a tool for analyzing constant-amplitude polynomial-phase signals
[Peleg et Friedlander(1995)].
In the quadratic phase case, the PPT can simply performed as :
y(t) = x(t + τ )x(t)
=
I1
X
i=1
|2
|ai
cos{2π(2τ δi t) + ϕi } + z 1 (t)
2
(11.5)
(11.6)
2
where τ is the delay parameter (to choose preferably in [ N2 , 2N
3 ]), ϕi = 2π(τ fi +τ δi )
and z 1 (t) is the term of noise plus interferences.1
11.3
IF Estimation Procedure of FM Signals
Now we apply one of the proposed algorithms in Section 11.4 to y(t) to estimate
the parameters δi , i = 1, . . . , I1 . 2 In order to estimate the parameters fi , i =
1, . . . , I, we consider the demodulation of the signal as follows : For i = 1, . . . , I1 ,
we compute
x(i) (t) = xa (t) exp(−j2π δ̂i t2 )
X
≈
exp{2jπ(fk t) + θi } + w(t)
k∈Ji
where Ji is the set of component indices with the same coefficient δi , δ̂i is the
estimate of δi , xa (t) is the analytic signal of x(t), and w(t) represents noise plus
interference. For each demodulated signal, we estimate the frequencies {fk , k ∈ Ji }
using one of the proposed algorithms (see section III.2) applied to the real part of
the demodulated signal <{x(i) (t)}. Note that it is not necessary to use a high
resolution method in the case where Ji contains one single signal index.
1
Note that z 1 (t) is an impulsive noise but not necessarily SαS.
We might have I1 < I in the case where certain chirp components of the signal have
the same phase coefficients δi but different coefficients fi .
2
156
11.4
Robust Subspace Estimation
In this section, we address the frequency estimation problem of multicomponent sinusoidal signals observed in impulsive noise environment given by equation (11.1) with φi (t) = 2πfi t + θi . We propose to apply the high resolution subspace algorithm MUSIC (Multiple Signal Classification) [Benidir(2002)] for the
frequency estimation. As the performance of the standard MUSIC algorithm based on the sample covariance matrix degrades if the underlying noise is impulsive,
we propose to apply MUSIC in the following three ways :
1. In the first one, we apply MUSIC to the truncated harmonic signal.
2. In the second one, we apply MUSIC to the generalized covariation function
of the signal.
3. In the third one, we apply MUSIC to the minimax robust covariance estimate
of the harmonic signal.
11.4.1
TRUNC-MUSIC algorithm
In α-stable environment, the use of sample covariance is no longer appropriate
for frequency estimation due to the infinite variance of the noise. To avoid this
difficulty, we propose to truncate in amplitude the ‘large-valued’ observations that
represent “large” impulsive noise realizations and apply MUSIC to the finite covariance matrix of the truncated process. TRUNC-MUSIC (TRUNC stands for
truncation) can be summarized in Table 11.1.
'
$
TRUNC-MUSIC Algorithm
Step 1. Truncation constant choice : We propose to compute the histogram
and choose K such that [−K, K] contains 90 % of the data.
Step 2. Pre-Processing : We truncate the signal according to :
x̃(t) =
x(t)
sign[x(t)]K
if
if
| x(t) |≤ K
| x(t) |> K
Step 3. Frequency estimation : Apply MUSIC algorithm to the covariance
matrix of the truncated signal x̃(t)
&
Tab. 11.1 – The proposed frequency estimation TRUNC-MUSIC algorithm.
%
11.4 Robust Subspace Estimation
11.4.2
157
FLOS-MUSIC algorithm
In this section we propose to use the fractional lower order statistics (FLOS)
of the signal for the frequency estimation. We consider an L × L generalized covariation coefficient (GCC) matrix Γ, whose (n, l)-th entry is given by :
Γn,l =
E[x(n)x(l)<p−1> ]
[x(n), x(l)]α
=
,
[x(l), x(l)]α
E[| x(l) |p ]
1≤p<α
(11.7)
where x<p−1> = |x|p−1 sign(x). It has been shown in [Altinkaya et al.(2002)] that
for a sinusoidal signal in α-stable noise, we have :
Γn,l =
I
X
ηi cos{2πfi (n − l)} + Pz δn
(11.8)
i=1
where {ηi , i = 1, · · · , I} are positive real constants depending on α and ai , Pz is a
real constant depending on noise pdf and δn is the Kronecker coefficient.
Equation (11.8) shows that we can obtain the frequency estimates by applying
MUSIC algorithm to the GCC-matrix Γ. In practice, we follow the following procedure summarized in Table 11.2.
'
$
FLOS-MUSIC Algorithm
Step 1. Compute an estimate of Γn,l (for p = 1) using
[Altinkaya et al.(2002)]
N −M
X+1
Γ̂n,l =
x(n + i − 1)sign(x(l + i − 1))
i=1
N −M
X+1
(11.9)
| x(l + i − 1) |
i=1
h i
Step 2. Apply MUSIC to the GCC matrix estimate Γ̂n,l
for the frequency estimation.
&
1≤n,l≤L
%
Tab. 11.2 – The proposed robust frequency estimation FLOS-MUSIC algorithm.
158
11.4.3
ROCOV-MUSIC algorithm
[A]- Robust estimation of the covariance.
Huber considered the parameter estimation problem in the presence of outliers or impulsive noise and proposed the concept of M-estimation statistics theory
[Huber(1981)]. Here, we consider M-estimates for the signal auto-covariance func4
tion γ(k) = E[x(t + k)x(t)]. Note that the robust autocovariance estimation is
equivalent to the robust variance estimation according to
1
E(XY ) = [V ar(X + Y ) − V ar(X − Y )]
4
(11.10)
where V ar is the variance. For an α-stable distribution we have infinite variance,
for that we propose to first truncate the observations using a large valued constant
K À 1. The M-estimator of the variance σ 2 is a solution of the following equation
[Huber(1981)]
N −1
1 X
x2 (i)
u(d2i ) 2 − u(d2i ) = 0
(11.11)
N
σ
i=0
x2 (i)
σ2
where d2i =
is the Mahalanobis quadratic distance and u is a weighting function defined in IR+ .
The existence and uniqueness of the solution of Eq.(11.11) was shown in
[Huber(1981)] under mild assumptions about the weighting function such as boundedness and continuity.
This function is typically chosen such that observations coming from the tails of
the assumed contaminated distribution are down-weighted.
Here, we use the robust non-descending weighting function which is based on
Huber’s minimax function given by u(d) = ω(d)/d with :
ω(d) = min(d, k)
(11.12)
where k is a suitable constant [Huber(1981)]. We can compute the M-estimate of
the variance as a solution of the latter equation.
Then the needed covariance matrix is estimated through the auto-covariance coefficients as shown in Table 11.3.
[B]- Robust frequency estimation
Now, we apply the subspace approach to estimate the parameters of the sinusoidal signals. Thus, we outline the proposed algorithm ROCOV-MUSIC in
Table 11.4.
159
$
'
ROCOV Algorithm
Step 1. Initialize the ROCOV
by the standard
P −1 algorithm
2
variance estimator : σ02 = N1 N
x
(i),
i=0
Step 2. Sweep : At the (j + 1)th iteration compute
N
−1
X
2
σj+1
=
2 2
ωi,j
x (i)
i=0
N
−1
X
;
2
−1
ωi,j
i=0
ωi,j = u(di,j ) = ω(di,j )/di,j , d2i,j =
x2 (i)
,
σj2
ω is the Huber non-
descending function given above in (11.12).
Step 3. According to Equation (11.10), compute the Mestimates γ̂(k), k = 0, . . . , L − 1 using the M-estimator of the
variance of [x(t + k) + x(t)] and [x(t + k) − x(t)] computed in
the above step 2.
Step 4. Stop the sweeps when the error is smaller than a given
threshold ².
&
%
Tab. 11.3 – The proposed robust covariance estimation ROCOV algorithm.
'
$
ROCOV-MUSIC Algorithm
Step 1. Compute the M-estimates γ̂(k), k = 0, . . . , L−1 using the so called
ROCOV algorithm summarized in Table 11.3.
Step 2. Apply MUSIC to the robust covariance matrix estimate Γ̂x =
T oeplitz[γ̂(k)0≤k≤L−1 ] for the frequency estimation.
&
Tab. 11.4 – The proposed frequency estimation ROCOV-MUSIC algorithm.
11.5
11.5.1
Mixture of sinusoidal component
Here, we perform a comparison study by simulation of the proposed robust frequency estimation methods TRUNC-MUSIC, ROCOV-MUSIC and FLOM-MUSIC.
For that, we consider three (I = 3) sinusoidal component with same amplitude
%
160
a1 = a2 = a3 = 1 and frequencies f1 = 0.1, f2 = 0.3 and f3 = 0.4.
We assume that the signal is affected by an impulsive noise with α-stable distribution. The characteristic exponent value is α = 1.5. We run 200 Monte-carlo
realizations to compute all considered statistics.
Figure 11.1 presents the mean square error (MSE) versus the noise dispersion in
dB, (the considered sample size is N = 1000). In Figure 11.2, we present the MSE
versus the sample size. The noise dispersion considered here is γ = 0.1.
Comparaison des algorithmes TRUNC−MUSIC, ROCOV−MUSIC et FLOM−MUSIC
−10
TRUNC−MUSIC
ROCOV−MUSIC
FLOM−MUSIC
−20
−30
MSE
−40
−50
−60
−70
−80
−90
−10
−8
−6
−4
−2
0
2
4
6
8
10
Dispersion en dB
Fig. 11.1: The MSE versus the noise dispersion in dB, N=1000.
Comparaison des algorithmes TRUNC−MUSIC, ROCOV−MUSIC et FLOM−MUSIC
−30
FLOM−MUSIC
TRUNC−MUSIC
ROCOV−MUSIC
−40
MSE
−50
−60
−70
−80
−90
0
100
200
300
400
500
600
700
800
900
1000
Taille d’ echantillon
Fig. 11.2: The MSE versus the sample size, γ = 0.1.
These figures show the effectiveness of the proposed methods.
11.5.2
161
Mixture of two chirps
In this subsection, we conduct three experiments to illustrate the proposed
procedure of IF estimation (or equivalently parameter phase estimation). In the
first one we use the TRUNC-MUSIC algorithm in the second step of our proposed approach as introduced in Section 11.3. While in the second one, we use the
ROCOV-MUSIC algorithm and the FLOS-MUSIC algorithm in the third experiment.
For that we consider two linear FM3 component mixtures (I = 2) with same
amplitudes a1 = a2 = 1, frequencies f1 = 0.05, f2 = 0.3 and second order parameters δ1 = 0.0001 and δ2 = 0.0003.
We suppose that the signal is affected by an impulsive noise with α-stable model
(α = 1.5). We run 500 Monte-carlo realizations to compute all evaluated statistics.
Chirp 1
Chirp 2
−10
−20
−15
−25
−20
MSE de f
1
MSE de f2
−15
−30
−35
−35
−45
−40
1000
1500
Taille de l’échantillon
−45
500
2000
−70
100
−80
50
−90
MSE de δ2
1
−30
−40
−50
500
MSE de δ
−25
−100
−110
1000
1500
Taille d’ échantillon
2000
1000
1500
2000
0
−50
−100
−120
−130
500
1000
1500
2000
−150
500
Fig. 11.3: The MSE versus the sample size, γ = 0.1.
Figures 11.3 and 11.4 show the MSE of the estimated phase parameters versus
the sample size and the noise dispersion respectively.
The three proposed techniques are compared using the same legend as in the
previous figures 11.1 and 11.2. These simulation examples show the effectiveness
of the proposed methods to mitigate the impulsive noise. The comparative study
shows clearly certain advantage for the ROCOV-MUSIC based procedure.
3
LFM called also commonly chirp signals.
162
Chirp 2
−10
−10
−15
−15
−20
MSE f2
MSE f1
Chirp 1
−5
−20
−25
−30
−5
0
Dispersion
5
−35
−10
10
−78
−75
−80
−80
−82
−85
−84
−86
−90
−105
0
Dispersion
5
10
5
10
−95
−100
−5
0
−90
−88
−92
−10
−5
Dispersion
MSE δ2
MSE δ1
−30
−10
−25
5
10
−110
−10
−5
0
Dispersion
Fig. 11.4: The MSE versus the noise dispersion in dB, N=1000.
11.6
Concluding Remarks
In this chapter, three two-step methods for IF estimation in heavy-tailed noise
are introduced. The first step consists of transforming the polynomial phase estimation into a frequency estimation problem one. The frequency estimation in the
second step of the proposed parametric methods is based on the use of the subspace MUSIC algorithm, applied respectively, on the amplitude truncated signal,
on the robust covariance matrix and on the generalized covariation coefficients
matrix. Simulation results are presented to validate our IF estimation methods.
In the considered simulation context, the comparative study shows the superiority of the parametric method using the robust covariance estimation technique
(ROCOV-MUSIC).
Chapitre 12
Robust Time-Frequency Approaches
As shown in this chapter, the conventional TFDs are quite sensitive with respect to non-Gaussian noise, in particular, to impulsive noise in which case they
produce poor estimation results. In order to get a good estimation performance
in this context, we propose in a first approach a preprocessing stage of the signal
to attenuate the impulsive noise effect before processing the signal TFD. In the
second approach, we use the robust statistics theory to define a new robust TFD,
so named robust MB-distribution (MB-distribution : Modified B-distribution). We
show that the resulting TFD from the two proposed approaches is able to reveal
the instantaneous frequency of the noisy multicomponent signal in an accurate
way.
12.1
Introduction-Problem Statement
This chapter is concerned with the analysis of multi-component FM signals,
corrupted by additive heavy-tailed noise. A multi-component signal means a signal
whose time-frequency representation presents multiple ridges in the time-frequency
plane.
• Signal model
Analytically, the noisy signal considered in this chapter is defined as,
x(t) = s(t) + z(t) =
M
X
si (t) + z(t)
(12.1)
i=1
where each component si (t), of the form si (t) = ai (t) ejφi (t) , is assumed to have
only one ridge, or one continuous curve, in the time-frequency plane. ai (t) is the
amplitude and φi (t) denotes the phase of the ith component of the signal. The
164
probability density function (PDF) of the random impulsive noise z(t) is modeled
as a heavy-tailed distribution1 . Examples of this kind of distributions include αstable with α < 2 and generalized Gaussian laws.
• Symmetric α-stable process (SαS) : As it has been presented in the first
part of this thesis, the PDF of SαS processes does not have closed form except for
the cases α = 1 (Cauchy distribution), α = 2 (Gaussian distribution) and α = 1/2
(Levy distribution). Due to their heavy tails, stable distributions do not have finite
second or higher-order moments, except for the limiting case of α = 2.
• Generalized Gaussian (GG) PDF : Another way to model impulsive noise
processes is through the generalized Gaussian PDF given by fα (x) = A exp (−b|x|α )
where 0 < α < 2 . For α = 2 we have the Gaussian distribution and for α = 1 we
have the Laplacian distribution which is known to be a good model for impulsive
noise.
• Time-frequency analysis : Our primary interest, in this work, is to estimate
the instantaneous frequency of each FM signal si (t) of (12.1), defined as
4
IFi (t) =
1 dφi (t)
2π dt
(12.2)
Time-frequency analysis techniques are used here as they reveal the multicomponent nature of such signals. Ideally, for a given FM signal, the TFD is represented as a row of delta functions around the signals instantaneous frequency.
This property makes the peak of the TFD a very powerful tool as an IF estimator. However, quadratic TFD of multi-component signals suffer from the presence of cross-terms, which can obscure the real features of interest in the signal.
The properties of a quadratic TFD are completely determined by its kernel. This
kernel should have the shape of a two-dimensional (2-D) low-pass filter to attenuate the cross-terms that exist away from the origin in the ambiguity domain
and preserve the auto-terms that concentrate around the origin of this domain
[Hussain et Boashash(2002)]. Considerable efforts have been made to define TFDs
that reduce the effect of cross terms while improving the time-frequency resolution (e.g., [Hussain et Boashash(2002), Barkat et Abed-Meraim(2004)]). This led
to the so-called reduced interference distributions that include the modified Bdistribution (MBD), and the signal-dependent optimal time-frequency representation. In this work, we have used the MBD [Hussain et Boashash(2002)] given
by :
Z Z
+∞
T (t, f ) =
−∞
τ
τ
GσM B (t0 )[x(t − t0 + )x∗ (t − t0 − )]e−j2πf τ dt0 dτ
2
2
(12.3)
kσ
where GσM B (t0 ) = cosh(t
0 )2σ , 0 ≤ σ ≤ 1 is a real parameter that controls the
tradeoff between component’s resolution and cross-terms suppression and kσ =
Γ(2σ)/(22σ−1 )Γ2 (σ) is the normalizing factor. The choice of the MBD, stems from
the fact that it presents a good performance in terms of resolution and crossterms suppression [Hussain et Boashash(2002)]. The effect of additive Gaussian
1
For complex-valued noise signal, we simply consider that z(t) = zr (t) + jzi (t) where
zr (t) and zi (t) represent two independent heavy-tailed processes with a same pdf function.
12.2 Failure of Standard TFD in Impulsive Noise
165
noise on the time-frequency representation is another consideration that has direct influence on the instantaneous frequency estimation with an important issue
[Peleg et Friedlander(1996)], [Hussain et Boashash(2002)].
However, in many practical applications, especially in communications, signals are
disturbed by impulsive noise due to the propagation environment or to large errors in collecting and recording the data. These noise processes are commonly
modeled by heavy-tailed distribution [Nikias et Shao(1995)]. Since outliers or impulsive noise have an unusually great influence on standard IF estimators, robust procedures attempt to modify those schemes. Only a limited literature was
dedicated to the analysis of multi-component FM signals in impulsive noise. In
[Sahmoudi et al.(2004b)], authors propose a class of parametric robust methods
to handle linear FM signals. In the same paper [Sahmoudi et al.(2004b)] a TFDbased technique has been proposed using a pre-processing stage to mitigate the
impulsive noise effect. The other alternative, which is the focus of this chapter, is
to apply the M-estimation principle in order to design robust TFD with respect to
impulsive noise. In [Katkovnik et al.(2003)] and [Barkat et Stankovic(2004)], the
authors proposed the robust spectrogram and the robust polynomial Wigner-Ville
distribution (PWVD), respectively. However, it is known that the spectrogram
suffers from low resolution in time-frequency domain, while the PWVD suffers
from cross-terms for multi-component signals. In this chapter, we use the modified
B-distribution [Hussain et Boashash(2002)] and the M-estimation theory to design a new robust TFD, which is referred to as the robust modified B-distribution
(R-MBD), that is used for the analysis of multi-component FM signals in heavytailed noise. We show that the proposed approach can solve problems that existing
time-frequency distributions cannot.
12.2
Failure of Standard TFD in Impulsive Noise
12.2.1
Effect of impulsive spike noise on TFD
To examine the effect of additive impulsive noise on the time-frequency representation of a signal, it is useful to use a model which is simple and provides
considerable insight into the nature of the artifacts that appear. We will carry out
our analysis in discrete time. For a clear and simple illustration, let us consider
the spike model for impulsive noise. The signal to be examined is
x(n) = s(n) + AδK (n − n0 )
(12.4)
where δK (n) is the Kronecker delta function,
PAδK (n−n0 ) represent the spike noise
model and A À Es is its amplitude. Es = n | s(n) |2 is the energy of the signal
166
s(n). If we compute for example the WVD of x(n), we get
Wx (n, f )
=
2
X
{s(n + m) + AδK (n + m − n0 )} {s∗ (n − m) + A∗ δK (n − m − n0 )} e−j4πmf
m
=
Ws (n, f ) + 2A∗
+2A
X
X
s(n + m)δK (n − m − n0 )e−j4πmf
m
s∗ (n − m)δK (n + m − n0 )e−j4πmf
m
+2 | A |2
X
δK (n + m − n0 )δK (n − m − n0 )e−j4πmf
m
=
n
o
Ws (n, f ) + Real A∗ s(2n − n0 )e−j4π(n−n0 )f + 2 | A |2 δK (n − n0 )
where Real(z) denotes the real part of the complex number z. The effect of this
single impulse at n = n0 is to place a very strong impulsive ridge with magnitude
2 | A |2 located in the time frequency plane and extending over all frequencies.
In addition, there is a secondary artifact that is the result of the cross-product
of the signal s(n) with the impulse. This artifact is a decimated copy of s(n)
extending over all frequencies and modulated in the normalized frequency domain
by the complex exponential term exp(−4jπ(n − n0 )). This additive cross term will
oscillate more rapidly in the normalized frequency domain for values of n that are
further removed from n0 .
12.2.2
Effect of impulsive α-stable noise on TFD
Because alpha-stable noise is impulsive, its effect on quadratic time-frequency
representation (QTFR) is different from the effect that is observed in the Gaussian
case (α = 2). In that case, the energy of the noise is uniformly spread over timefrequency plane. This can be seen by examining the autocorrelation function of
the observed signal x(n) given by
Rx (n, m) = IE {x(n + m)x∗ (n − m)}
(12.5)
One can show that the time-frequency of a signal s(n) in additive alpha-stable
noise is severely degraded. Indeed, let z(n) denote the α-stable additive noise with
α < 2, then for m 6= 0,
Rx (n, m) = s(n + m)s∗ (n − m) + s(n + m)IE {z ∗ (n − m)}
∗
(12.6)
∗
+s (n − m)IE {z(n + m)} + IE {z(n + m)z (n − m)} , (12.7)
and for m = 0,
Rx (n, 0) =| s(n) |2 +s(n)IE{z ∗ (n)} + s∗ (n)IE{z(n)} + IE{| z(n) |2 }
(12.8)
Since IE{z(n)} is infinite when α ≤ 1, then all elements of the autocorrelation
matrix are infinite. Also, since IE{| z(n) |2 } is infinite when α < 2 we have
Rx (n, 0) = ∞ for all n. Thus the autocorrelation blows up for α < 2, making
the standard time-frequency useless for characterizing signal in impulsive environments.
12.3 Pre-processing Techniques based Approach
12.2.3
167
The need of robust TFD in Gaussian environment
Here, we suppose that the noise z(n) is a complex Gaussian random process
and we examine the instantaneous autocorrelation function of the observed signal
x(n). One can write the noise PDF as
pz (z) =
1
1
exp(− 2 | z |2 )
2
πσ
πσ
(12.9)
We can express the instantaneous autocorrelation of the observed signal as
Rx (n, m) = sx (n, m) + z1 (n, m) + z2 (n, m) + Rz (n, m)
(12.10)
Clearly the term sx (n, m) is deterministic, the two terms z1 (n, m) and z2 (n, m)
represent complex random variables. To analyse the instantaneous autocorrelation
behavior, one must analysis the probability distribution of the final term Rz (n, m).
Using the PDF formula of functions of r.v.s [Benidir(2002)], we have
pRz
∝ pz (h−1 (y))
−1
∝ exp(−c | h
(12.11)
2
(y) | )
∝ exp(−c | y |)
(12.12)
(12.13)
where h is the considered transform z 7→ h(z) = zz ∗ . Thus the instantaneous
autocorrelation has a Laplace PDF which has heavy tails. Then, computation of
a QTFD of a signal in Gaussian noise generate an impulsive noise. Consequently,
robust time-frequency analysis is necessary also in Gaussin environment.
12.3
Pre-processing Techniques based Approach
The first step consists in reducing the impulsive noise amplitudes in order to
improve the quality of the TFD of the considered noisy signal. To do so, two
solutions might be suggested.
12.3.1
Exponential compressor filter
We propose here to pass the noisy signal through a nonlinear device that
compresses the large amplitudes (i.e., reduces the dynamic range of the noisy
signal) before further analysis [Barkat et Abed-Meraim(2003b)]. The output of
the nonlinear device, is expressed as
x̃(t) = ψβ [x(t)] = |x(t)|β sign[x(t)]
where 0 < β ≤ 1 is a real coefficient that controls the amount of compression
applied to the input noisy signal x(t).
This technique is similar to that used in nonuniform quantization where a
totally different nonlinear law is used [Jayant et Noll(1984)]. A plot of this compressor law is displayed in Figure 12.1 for different values of β.
Observe that the compressor law is linear around the origin (i.e., very small
input values). The linearity and its corresponding interval range, obviously, depend
168
2
β=0.1
β=0.5
β=0.9
1.5
Compressor output
1
0.5
0
−0.5
−1
−1.5
−2
−2
−1.5
−1
−0.5
0
0.5
Compressor input
1
1.5
2
Fig. 12.1: The nonlinear law of the compressor used in the pre-processing stage.
on the value of β. The smaller is β, the smaller is the linearity range. This means,
that for weak signals (i.e., the noiseless signal amplitude is small enough compared
to the noise spikes), and using an appropriate value of β, the compressor output
signal may be approximated by a scaled version of the input noiseless signal embedded in a new additive noise whose variance is much smaller than the input noise
variance. Figure 12.2 displays the time representation of a linear FM in impulsive
noise compressed using β = 1 (i.e., no compression) β = 0.9, β = 0.5 and β = 0.2,
respectively. If we assume the effect of the compressor on the desirable noiseless
signal characteristics (i.e., its IF) to be negligible, then the achieved reduction in
the noisy signal variance will yield better results in its analysis.
10
10
β=0.9
β=1
5
Amplitude
Amplitude
5
0
0
−5
−5
−10
0
100
200
300
Time [samples]
400
500
−10
4
0
100
500
400
500
β=0.2
1
Amplitude
2
Amplitude
400
2
β=0.5
0
−2
−4
200
300
Time [samples]
0
−1
0
100
200
300
Time [samples]
400
500
−2
0
100
200
300
Time [samples]
Fig. 12.2: Compression of a linear FM signal in impulsive noise using different
values of β.
12.4 Robust Time-Frequency Approach
12.3.2
169
Huber filter
We use here the Huber criterion to define the Huber filter which truncates in
amplitude the ‘large-valued’ observations that represent “large” impulsive noise
realizations. For the choice of the truncation constant K, we propose to compute
the histogram of observations and choose K such that [−K, K] contains 90 % of
the data ; then, the output of the Huber linear filter, is expressed as :
½
x(t)
if | x(t) |≤ K
x̃(t) = ψH [x(t)] =
sign[x(t)]K if | x(t) |> K
The second step consists in applying a time-frequency analysis presented in
section Section 12.5 to the processed signal x̃(t) for the IF problem estimation.
12.4
Robust Time-Frequency Approach
In order to get a good TFD-based IF estimation performance in impulsive
environment, we use the robust statistics theory of M-estimation to define a new
robust quadratic time-frequancy distribution.
12.4.1
Optimal TFD kernel in α-stable noise
Recall that the fractional lower order moments (FLOMs) of an alpha-stable
random variable with zero location parameter and dispersion γ are given by E|X|p =
p
C(p, α)γ α for 0 < p < α where C(p, α) is a constant depending only on p and
α. This tells us that the pth order moment of an α-stable random variable and
its dispersion are related through only a constant. Therefore, the MD criterion is
equivalent to least Lp -norm estimation where 0 < p < α and the estimates of a
parameter θ can be obtained from equation (3.18) using the Lp -norm loss function
ρp (x) = |x|p
;
ψp (x) = sign(x)p|x|p−1
(12.14)
4
(sign(x) = x/|x|) as a tool of the robust estimation which appears originally as
a heuristic idea supported latter by theoretical and experimental studies. In particular, for p = 1, L1 -norm criterion referred to as “modulus function” was used
in [Katkovnik et al.(2003), Barkat et Stankovic(2004)] to define the robust periodogram and robust PWV distributions. It should be emphasized that the least
Lp -norm estimates are not only optimal in a MD sense for α-stable data, but
also optimal in the maximum likelihood sense for the family of generalized Gaussian distribution. Indeed, the ML estimator coincides with the Lp -norm criterion
choosing p as the same value of the index p in the generalized Gaussian PDF. In
addition, applying equations (3.25) and (3.26) over the class of generalized Gaussian pdf we can show easily that the least Lp -norm estimates is optimal also in the
robust minimax sense if we choose p as the smallest value in the considered set of
p values. It is recognized that “outliers” which arise from heavy tailed distribution
noise or are simply bad points due to measurement errors, have an unusually large
influence on the standards estimators based on least squares estimators. Accordingly, as mentioned previously robust methods have been developed to modify least
170
squares schemes so that the outliers have much less influence on the final estimates.
One of the most satisfying robust procedures is that given by a modification of
the principle of maximum likelihood ; hence we proceed with that approach called
M-estimation [Huber(1981)].
12.4.2
A new robust quadratic time-frequency distribution
Let consider the noisy signal (12.1) in discrete-time x(kT ) = s(kT ) + z(kT )
where T is a sampling period. A standard time-frequency distribution, at a point
(kT, f ), is shown to be a solution of the optimization problem [Katkovnik et al.(2003)]
B̂ = arg min J (kT, f, B)
B
where
J (kT, f, B) =
N/2
X
w(nT )ρ[e(k, f, n)],
(12.15)
(12.16)
n=−N/2
e(k, f, n) = Gx (kT, nT )e−j2πf nT − B
where w(nT ) is a window function, Gx (kT, nT ) being the kernel of the considered
quadratic time-frequency distribution of the FM signal x(kT ) and B is an estimate
of the expectation of the sample average of the quantity G(kT, nT )e−j2πf nT . If we
choose the loss function ρ(e) = |e|2 , we can show by solving for B the expression
dJ (kT,f,B)
= 0 that the optimal solution corresponds to the standard TFD
dB∗
Bxs (kT, f ) =
N/2
X
w(nT )
Gx (kT, nT )e−jπf nT
PN/2
n=−N/2
n=−N/2 w(nT )
(12.17)
Thus, for a weighted window, the standard TFD can be treated as an estimate of
the mean, calculated over a set of complex-valued observations
G = {Gx (kT, nT )e−jπf nT ; n ∈ [−N/2, N/2]}
It has been shown that the optimal loss function ρ derived in the minimax Huber’s estimation theory (see section 2) could be applied to the design of a new
class of robust time-frequency distributions, inheriting properties of strong resistance to impulsive noise. In particular, some robust TFDs have been derived by
using the absolute error loss function ρ(e) = |e| in (12.16) [Katkovnik et al.(2003)].
In this work, we propose to choose the loss function ρ in the criterion (12.16) as
the Lp -norm criterion ρ(e) = |e|p wherep < 2 is a parameter to control the exponential loss function degree. The choice of this optimal criterion is well motivated
in section 12.4.1. In this work, we use the MBD to handle multi-component nonstationary FM signals given by model (12.1). However, similarly to the standard
spectrogram, WVD and PWVD, the standard MB-distribution is not an adequate
analysis tool in presence of heavy-tailed noise. To mitigate this problem, we use
the MB-distribution kernel given in Equation (12.3) and the Lp norm loss function in the design of the proposed robust MBD to analyze FM signals affected
12.4 Robust Time-Frequency Approach
171
by impulsive noise. In this case, we find the optimal solution, labelled the robust
modified B-distribution (R-MBD), to be


N/2


X
∂
−jπf nT
p
w(nT
)|G
(kT,
nT
)e
−
B|
=0
(12.18)
x

∂B∗ 
n=−N/2
⇐⇒
N/2
X
w(nT )(Gx (kT, nT )e−jπf nT − B)|Gx (kT, nT )e−jπf nT − B|p−2 = 0
(12.19)
n=−N/2
⇐⇒
Bxr (kT, f )
=
N/2
X
n=−N/2
d(k, f, n)
Gx (kT, nT )e−jπf nT ,
D0 (kT, f )
d(k, f, n) = w(nT )|Gx (kT, nT )e−jπf nT − Bxr (kT, f )|p−2 ,
D0 (kT, f ) =
N/2
X
d(k, f, n)
(12.20)
(12.21)
(12.22)
n=−N/2
Since, the quantity Bxr (kT, f ) appears on the right as well as on the left hand
side of Equation (12.20), an iterative procedure is necessary in order to obtain the
R-MBD. The robust-MBD algorithm can be summarized as follows in Table 12.1.
It was shown in [Kaluri et Arce(2000)] that the above iterative algorithm will
$
'
Robust-MBD Computation
Step 1. Evaluate the standard MBD using equation (12.17)
Step 2. For initialization purposes, set the iteration index i = 0 and
Bxr 0 (kT, f ) = Bxs (kT, f )
Step 3. Sweep. Set i = i + 1 and do :
– Compute d(k, f, n) and D0 (kT, f ) using equations (12.21) and (12.22)
respectively.
– Compute the robust MBD, for iteration i, Bxr i (kT, f ) using Equation
(12.20).
Step 4. If the relative absolute difference between two iterations is smaller
than a fixed threshold ε, i.e.
|Bxr i (kT, f ) − Bxr i−1 (kT, f )|/|Bxr i (kT, f )| ≤ ε
then stop the algorithm. Otherwise go to Step 3.
&
Tab. 12.1 – Computation procedure of the Robust-MBD.
%
172
converge to a single (global) minimum under a good choice of the initial value. In
our case, the choice of Bxr 0 (kT, f ) = Bxs (kT, f ) satisfies the necessary condition of
the convergence.
12.5
IF Estimation & Component Separation
The proposed component separation procedure algorithm consists of separating
the signal components and estimating their respective IF laws from the signal
TFD. In impulsive environment, we propose to apply this algorithm to (i) : the
TFD of the pre-processed signal in the first procedure and (ii) to the robust MBdistribution of the noisy signal in the second procedure. This proposed component
separation algorithm is illustrated in Table 12.5. The first step of the algorithm
consists in noise thresholding to remove the undesired ’low’ energy peaks in the
time-frequency domain. This operation can be written as :
½
Tth (t, f ) =
T (t, f ) if T (t, f ) > ²
0
otherwise
where ² is a properly chosen threshold. In our simulations we used ² = 0.01 max T (t, f ).
(t,f )
Assuming a ‘clean’ TFD, the M components IF are estimated, at each time
instant t, from the M peak positions of the TFD slice Tth (t, f ). Let observe that
if, at a time instant t0 , two components are crossing then the number of peaks (at
this particular slice T (t0 , f )) is smaller than the total number of components M .
For practical implementation reasons, we decide that a crossing occurs when the
number of peaks is smaller than M over a fixed number of consecutive slices. In
this case, we implement the following procedure :
1. Choose a particular maximum point location in the slice where the crossing
occurs.
2. Measure all distances from this point to the peaks locations of the previous
slice (with no crossing).
3. Select the 2 smallest distances and add them.
4. Repeat Steps 1 to 3 for all other maximum point locations in the slice where
the crossing occurred.
5. From the set of the smallest sums found above, the program selects the
smallest value and the points associated to them. This will yield the location
where the crossing occurred and the 2 components involved in the crossing.
Then, we use a simple numerical permutation operation of the 2 components
involved in the crossing. The details of the proposed separation technique is outlined in Table 12.5.
173
'
$
Time-Frequency based Component Separation Algorithm
1. Assign an index to each of the M components in an orderly manner.
2. For each time instant t (starting from t = 1), find the components
frequencies as the peaks positions of the TFD slice T (t, f ).
3. Assign a peak to a particular component based on the smallest distance
to the peaks of the previous slice T (t − 1, f ) (the IFs are continuous
functions of time). For the special case of a crossing point (see Step 4 how
to detect it and its corresponding components), we assign the peak to both
crossing components.
4. If at a time instant t a crossing point exists (i.e., number of peak smaller
than the number of components), identify the crossing components using
the smallest distance criterion by comparing the distances of the actual
peaks to those of the previous slice.
5. Permute the indices of the corresponding crossing components.
&
%
Tab. 12.2 – Component separation procedure for the proposed algorithm
12.6
The estimation performance is measured by the normalized MSE defined by
Nr
1 X
kθ̂r − θk2
N M SE =
Nr
kθk2
r=1
where θ is the considered parameter, θ̂r is the estimate of θ at the rth experiment
and Nr is the number of Monte-Carlo runs chosen here equal to 500.
[B]- First experiment
To check the validity and superiority of the proposed algorithm, we consider
the time-frequency representation of a three-component FM signal corrupted by
an impulsive noise modeled as a generalized Gaussian distribution with α = 1.5
. The standard MBD, displayed in Fig. 12.3. yields a poor representation ; while,
the R-MBD displayed in Fig. 12.4. reveals clearly the features of the noisy signal.
The superiority of the R-MBD over the standard MBD is obvious in this example.
174
Fs=1Hz N=512
Time−res=1
500
450
400
Time (seconds)
350
300
250
200
150
100
50
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
Fig. 12.3: The standard MBD of the multi-component signal test.
Fs=1Hz N=512
Time−res=1
500
450
400
Time (seconds)
350
300
250
200
150
100
50
0.05
0.1
0.15
0.2
0.25
0.3
Frequency (Hz)
0.35
0.4
0.45
Fig. 12.4: The Robust-MBD of the multi-component signal test.
[A]- Second experiment
In this experiment, we consider a discrete-time multicomponent FM signal
consisting of two linear FM components embedded in additive impulsive noise
x(n) = s1 (n) + s2 (n) + z(n) n = 0, 1, . . . , N − 1
where s1 (n) = exp{j2π(a1 n + b1 n2 )} and s2 (n) = exp{j2π(a2 n + b2 n2 )}. The
noise z(n) is chosen to be α-stable with zero location parameter, characteristic
exponent α = 1 and dispersion equal to γ = 1. The signals’ IF coefficients are
given by a1 = 0.2, b1 = 0.1 ∗ 10−3 , a2 = 0.45 and b2 = −1.5 ∗ 10−3 .
In the first step, we perform a pre-processing of the noisy signal to mitigate the
impulsive noise using the exponential compressor filter exp-TFD algorithm (with
parameter β = 0.1) and in the Huber-TFD algorithm.
In the second step, we put the pre-processed signal x̃(n) through the proposed
algorithm (we chose σ = 0.01 for the MB-distribution kernel) in order to extract
the two respective components. The peaks of the extracted components (in the
time-frequency domain) are, then, used to estimate the IFs of the chirps. We use
a simple polynomial fit to obtain estimates of (a1 , b1 ) from IF1 (n) and estimates
of (a2 , b2 ) from IF2 (n).
175
The same noisy signal x(n) is also put through the R-MBD (with β = 1)
algorithm developed in this work to validate this method and to compare it with
the preprocessing-based methods. In Fig. 12.5, we display the NMSE of the RMBD, exp-TFD and Huber-TFD versus the sample size.
Chirp2
−20
−30
−25
NMSE b2
NMSE b
1
Chirp 1
−25
−35
−40
−45
−50
−30
−35
−40
−45
0
500
1000
1500
−50
2000
0
500
Sample size
−25
2000
1500
2000
−25
NMSE a2
NMSE a1
1500
−20
−30
−35
R−MBD
−40
−45
1000
Sample size
exp−TFD
Huber−TFD
0
500
1000
Sample size
−30
−35
−40
1500
2000
−45
0
500
1000
Sample size
Fig. 12.5: The NMSE versus sample size : a comparative study.
These simulations confirm the effectiveness of the proposed algorithms and, at
least in this simulation context, the best results in terms of estimation accuracy
are obtained by the R-MBD algorithm (which is, on the other hand, the most
expensive one) followed by the exp-TFD method.
176
[B]- Third experiment
Here, we assess the statistical performance of the R-MBD based IF estimator of
multi-component FM signals. For that let us consider a two linear FM components
embedded in additive impulsive α-stable noise z(t) modeled as
x(t) = s1 (t) +
s2 (t) + z(t) where s1 (t) = exp{j2π(a1 t + b1 t2 )} and s2 (t) = exp{j2π(a2 t + b2 t2 )}.
The noise z(t) is chosen with zero location parameter, characteristic exponent
α = 1 and dispersion γ. The signals IF coefficients are given by a1 = 0.2, b1 =
0.1 ∗ 10−3 , a2 = 0.45 and b2 = −1.5 ∗ 10−3 . To validate the proposed method and
to compare it with some existing methods, we implement the following procedure :
1. Compute the TFD of the two chirp components signal in α-stable noise x(t)
using r-PWVD [Barkat et Stankovic(2004)] and the proposed R-MBD. For
that, we choose σ = 0.01 for the MBD kernel and p = α/3 for the fractional
Lp -norm loss function to design the R-MBD. In the experiments, we fix the
signal length equal to N = 501 and the window length, used in the r-PWVD
implementation, equal to 101 samples.
2. Put the computed TFD matrix through the component separation algorithm
in order to extract the two respective components. The peaks of the extracted
components (in the time-frequency domain) are, then, used to estimate the
IFs of the chirps.
3. Put the same noisy signal through one of the widely used IF estimation
methods which is the High Ambiguity Function (HAF) algorithm to estimate
the four chirp parameters a1 , b1 , a2 and b2 [Peleg et Friedlander(1996)].
4. For the HAF algorithm, use a simple polynomial fit to obtain estimates of
IF1 (t) from (a1 , b1 ) and estimates of IF2 (t) from (a2 , b2 ).
In Fig. 12.6., we display the NMSE of the IF estimate versus the noise dispersion γ for HAF, r-PWVD and R-MBD. The accuracy and superiority of the
R-MBD over both algorithms r-PWVD and HAF is evident.
Chirp 1
Chirp 2
−5
−5
HAF
HAF
−10
−10
−15
−15
r−PWVD
NMSE of IF estimates in dB
NMSE of IF estimates in dB
r−PWVD
−20
R−MBD
−25
−30
−20
R−MBD
−25
−30
−35
−35
−40
−40
−45
−5
−45
0
5
10
−10*log10(γ)
15
20
−50
−5
0
5
10
−10*log10(γ)
15
20
Fig. 12.6: NMSE of IF estimates, corresponding to the HAF, r-PWVD and
the R-MBD for a noisy two-component chirp signal.
177
[D]- Fourth experiment
In this experiment, a comparative study of the previous IF estimation methods
of multicomponent chirp signal is addressed. For this purpose, we consider a mixture of two chirp components of the same amplitude a1 = a2 = 1, with f1 = 0.05,
f2 = 0.3, δ1 = 0.0001 and δ2 = 0.0003 embedded in impulsive α-stable noise with
parameter exponent α = 1.
For the non-parametric TFD-based method, we propose the compressing technique
with parameter β = 0.1 (we chose σ = 0.01 for the MB-distribution kernel).
Figures 12.7 and 12.8 represent the NMSE of the phase parameters versus
the sample size and the noise dispersion, respectively. In this simulation context,
the best results are obtained by the time-frequency based method followed by the
parametric method based on robust covariance estimation (ROCOV-MUSIC).
Chirp 1
Chirp 2
0
0
−10
NMSE2f
1
NMSE f
−20
−40
−60
−80
−20
−30
−40
0
200
400
600
800
1000
−50
0
200
400
600
800
1000
800
1000
Sample size
Sample size
−25
−10
−15
NMSE delta
2
−20
1
NMSE delta
−30
−25
−35
−30
TFD
ROCOV−MUSIC
TRUNC−MUSIC
FLOS−MUSIC
−40
−45
0
200
400
600
−35
−40
800
1000
−45
0
200
400
600
Sample size
Sample size
Fig. 12.7: Normalized MSE of the various phase parameters versus sample
size, γ = 0.1.
178
Chirp 2
−20
−20
−30
−30
−40
NMSE2f
NMSE1f
Chirp 1
−10
−40
−50
−60
−5
−60
0
5
10
Dispersion in dB
15
20
−20
−70
−5
0
5
10
Dispersion in dB
15
20
0
5
10
Dispersion in dB
15
20
−35
−40
1
NMSE delta
2
−30
NMSE delta
−50
−45
−40
−50
−50
−60
−5
−55
0
5
10
15
20
−60
−5
Dispersion in dB
Fig. 12.8: Normalized MSE of the various phase parameters versus noise
dispersion in dB, N=1000.
12.7
Concluding Remarks
In this chapter, we proposed a new approach to analysis multicomponent nonstationary FM signals corrupted by additive heavy-tailed noise using the robust
statistics theory. Two different procedures are proposed :
• Robust preprocessing approach : In this part, a preprocessing stage, based
on the use of M-estimation idea, have been proposed to clean the timefrequency image. This first step allow us to get a good time-frequency representation thai it is essential in the second IF estimation step.
• Robust time-frequency distribution approach : In this part, the fractional
Lp -norm (0 < p < α) loss function have used in the M-estimation framework
to design a new robust TFD referred to as R-MBD. The proposed R-MBD
is robust to the effect of heavy-tailed α-stable noise.
Computer simulations confirm the effectiveness of the proposed algorithms and
that the best results in terms of estimation accuracy are obtained by the R-MBD
based algorithm which is, on the other hand, the most expensive one followed by
the r-PWVD based method.
In the considered simulation context, the comparative study shows the superiority of the non-parametric (TFD-based) method and the parametric method using
the robust covariance estimation technique (ROCOV-MUSIC).
179
Chapitre 13
Conclusions et Perspectives
De la statistique à variance infinie à la séparation de sources et au traitement
des signaux non-stationnaires, nous avons souhaité explorer des domaines moins
connus des praticiens mais qui peuvent encore révéler des spécificités théoriques
intéressantes et trouver des applications nouvelles.
13.1
Conclusion Générale
En guise de conclusion générale, nous allons tenter d’établir une synthèse globale sur le travail qui a été réalisé dans cette thèse.
Nous avons, grâce à l’utilisation d’outils mathématiques probabilistes et statistiques, tenté d’apporter de nouvelles pierres à deux édifices parmi les plus importants du traitement du signal : la séparation de sources non-gaussiennes et
l’estimation des signaux non-stationnaires. Nous avons donc répondu aux interrogations que l’on avait soulevées en débutant ce travail de thèse, même si ce
n’est que partiellement, puisque beaucoup de pistes prometteuses n’ont été que
soulevées. Mais avouons-le, une étude exhaustive des différents intérêts des distributions α-stables en traitement du signal représente un travail de longue haleine,
et les questions ouvertes semblent toujours plus nombreuses.
Cette recherche de robustesse en traitement du signal nous a amené à nous
pencher, en détail, sur les problèmes de séparation de sources, spécialement lorsqu’elles sont de nature impulsive à variance infinie et l’estimation des signaux nonstationnaires multi-composantes dans un bruit impulsif. Ces problèmes reviennent
à répondre aux questions suivantes :
180
1. Quelle sont les méthodes existantes, basées sur l’hypothèse d’existence des
statistiques d’ordre deux et d’ordre supérieur, qui peuvent toujours fonctionner du point de vue pratique dans le cas des distributions α-stables ?
2. Comment peut-on justifier cela du point de vue mathématique une fois
prouvé par des simulations ?
3. Comment peut-on adapter et rendre robuste les méthodes qui ne s’appliquent
plus dans le cas de sources impulsives α-stables ?
4. Comment explorer l’utilisation des moments fractionnaires d’ordre inférieur
pour séparer ce genre de sources et généraliser leur utilisation au cas de
sources de nature inconnue (impulsive ou non) ?
5. Comment peut-on réduire l’effet du bruit impulsif sur la représentation
temps-fréquence des signaux non-stationnaires ?
6. Est-il possible de définir des distribution temps-fréquence robustes à l’effet
d’un bruit impulsif ?
7. Peut-on se contenter des méthodes paramétriques d’estimation des signaux
non-stationnaires multi-composantes et les rendre robustes ?
Les méthodes développées dans cette thèse, et plus particulièrement l’emploi
des moments fractionnaires d’ordre inférieur, semblent fournir une piste intéressante
et prometteuse. Précisons, de plus, que les développements qui ont été conduits,
ne l’ont été qu’à l’aide des propriétés statistiques des lois de probabilités α-stables.
Nous les avons présentées en détail ainsi que d’autres propriétés qui n’ont pas été
employées mais qui pourraient se révéler d’une grande utilité, dans un chapitre
dédié uniquement aux lois stables et à leurs utilisation en traitement du signal.
[A]- Séparation de sources impulsives
Nous avons essayé de faire une étude complète de la classe des méthodes basées
sur les FLOS, en faisant le tour de plusieurs aspects habituellement abordés dans
la séparation de sources classiques, incluant les problèmes de blanchiment, de
séparation et d’optimisation d’une fonction de contraste ; ne laissant de coté, par
manque de temps, que les aspects d’analyse asymptotique de performances.
Proposons, tout d’abord, un critère de séparation basé sur la minimisation de la
somme des dispersions des observations. Nous avons montré que ce critère de dispersion minimum (MD) est une fonction de contraste qui permettra de séparer
les sources de distributions α-stables et nous avons réalisé une implémentation de
type Jacobi de l’algorithme proposé. Plus précisḿment, afin d’optimiser cette fonction de coût exprimée en fonction de la matrice de séparation B sous contrainte
d’othogonalité, nous avons décomposé cette matrice en produit de matrices de
Givens-Jacobi en ramenant le problème d’optimisation matricielle à celui d’optimisation d’une fonction à variable réelle θ (l’angle de rotation).
Afin d’évaluer les performances de la méthode MD nous avons défini un indice de
mesure de performance comme généralisation du rapport signal/interférencs utilisé
habituellement dans la séparation des signaux à variance finie. Nous avons ainsi
13.1 Conclusion Générale
181
conduit une série d’expériences de simulations pour comparer la méthode proposée
avec les méthodes classiques JADE, EASI et une méthode de type quasi-maximum
de vraisemblance proposée spécialement pour les sources α-stables, nommée RQLM
[Shereshevski et al.(2001)]. La méthode MD réalise les meilleures performances
dans tous les cas de figures considérés (avec bruit, sans bruit, petite et grande taille
d’échantillon,... ). Précisons, à ce propos, que la méthode MD présente une robustesse surprenante contre les erreurs d’estimation de l’exposant caractéristique α
des distribution α-stable. Ce comportement se retrouve également dans l’approche
suivante de Lp -norme, et s’explique par le fait que la modification de la puissance
α dans l’expression de la dispersion par une autre valeur α1 (dans le même interval
(0, 1] ou [1, 2) que α) définie aussi une fonction de contraste MD et donc permet
de séparer correctement les sources [Sahmoudi et al.(2005)].
Exploitant la relation de proportionalité entre la dispersion d’une v.a.r. α-stable
et son p-ème moment, nous avons étendu et généralisé le critère de dispersion minimum pour séparer des mélanges linéaires de sources à distributions inconnues
(α-stable ou non). Cette approche rejoint naturellement la représentation parcimonieuse des sources par la norme Lp utilisées dans la littérature. Sous contrainte
d’orthogonalité, nous avons montré que le critère qui consiste à minimiser la somme
des normes Lp des observations est une fonction de contraste qui sépare correctement les mélanges linéaires [Sahmoudi(2005)].
Toujours attaché à l’examen des méthodes existantes de séparation de sources,
nous avons constaté que l’on peut classer les approches existantes en deux classes
de méthodes : une classe de méthodes basées sur la structure algébrique tensoriel du mélange (comme l’algorithme JADE par exemple) toujours valable pour
la séparation des sources à queue lourde, en particulier celles de distributions
α-stables et une deuxième classe des méthodes basées sur différents critères de mesure d’indépendance incapable de séparer les sources considérées dans ce travail.
La question que nous avons posé logiquement après cette classification est la suivante : Comment on peut justifier, de point de vue mathématique, l’utilisation des
algorithmes de structure algébrique robuste vu qu’ils sont basés sur des statistiques
infinies ”en principe” et comment peut-on rendre robuste ceux qui ne le sont pas ?.
Pour répondre à cette question, nous avons proposé une normalisation approprié
des statistiques d’ordre deux (la covariance) et des cumulants d’ordre quatres.
Cette normalisation a fait converger asymptotiquement ces nouvelles statistiques
normalisées vers des tenseurs de structures désirées pour la séparation de sources.
Ces statistiques construites pour valider les approches classiques algébrique de
séparation de sources [Sahmoudi et al.(2004a)], s’appuient sur la proprièté de queue
lourde des lois stables. Elles nous ont permis aussi d’introduire des normalisations
convenables dans les critères de séparation de sources basés sur la décorrélation
non-linéaire. Ces nouvelles fonctions de contrastes sont alors devenues robustes et
valables dans le cas des signaux à queue lourde [Sahmoudi et Abed-Meraim(2004b)].
Néanmoins, si les composantes indépendantes n’ont pas cette dernière caractérisation,
la normalisation ne joue qu’un rôle de multiplication par une constante et qui ne
peut être que bénifique pour la convergence de certains algorithmes comme le cas
de l’algorithme EASI par exemple. Ces statistiques normalisées définissent en fait
une classe entière de techniques qui est loin d’être exploitée entièrement dans cette
182
thèse.
Une autre approche fondamentale en estimation statistique et qui a attirée
notre attention pour l’estimation des composantes indépendantes d’un mélange
linéaire est celle du principe du maximum de vraisemblance (MV) qui ouvre comme
toujours la possibilité de plusieurs développements. En effet, nous avons proposé
une structure semi-paramétrique de l’approche MV qui consiste à combiner une
version stochastique de l’algorithme EM et une technique d’approximation des
densités des sources par les fonctions log-splines [M. Sahmoudi et al.(2005)]. Les
avantages de cette méthode sont appréciables dans le sens du non besoin d’un
modèle des densités des sources et dans l’optique d’une robustesse aux erreurs
éventuelles de modélisation des sources puisque nous approchons et nous estimons
leurs densités directement à partir des observations.
[B]- Traitement des signaux non-stationnaires multicomposantes
Dans la deuxième partie de cette thèse, nous avons abordé certains aspects
de l’analyse des signaux FM non-stationnaires et nous avons considéré principalement, le cas d’un signal multi-composantes affecté par un bruit impulsif. Nous
avons traité le problème d’estimation de la fréquence instantanée.
Pour cela, nous avons proposé d’utiliser la méthode dite de M-estimation de
Huber robuste à l’effet des valeurs abérrantes (ou bruit impulsif) dans les données
et qui a comme objectif de fournir des estimateurs dont les performances ne sont
pas trop détériorées en présence d’un bruit non-gaussien à queue lourde.
Cet environnement nous a conduit à modéliser le bruit par une distribution αstable en conjonction avec l’approche de M-estimation dans une procédure paramétrique dans un premier temps et puis dans une procédure d’analyse tempsfréquence dans un second temps.
– Le premier objectif fut la recherche de nouvelles approches paramétriques
robustes dans ce cas particulier de bruit non-gaussien.
Nous commençons par ramener le problème à celui de l’estimation de signaux
harmoniques noyés dans un bruit impulsif grâce à une transformée polynomiale du signal. La méthode haute résolution MUSIC est alors appliquée au
signal ainsi transformé pour l’estimation des paramètres. Trois cas de figures
sont considérés en dérivant trois algorithmes : (i) Celui de l’application directe de l’algorithme MUSIC au signal harmonique tronqué. L’algorithme
correspondant est baptisé TRUNC-MUSIC ; (ii) celui de l’application de
l’algorithme MUSIC à l’estimée robuste de la fonction de covariance du signal harmonique. L’algorithme correspondant est baptisé ROCOV-MUSIC
et (iii) celui de l’application de MUSIC à la covariation généralisée du signal.
L’algorithme correspondant est baptisé FLOS-MUSIC car il est basé sur les
statistiques fractionnaires d’ordre inférieur FLOS.
Les résultats de comparaison ont montré une certaine supériorité en faveur
13.2 Perspectives
183
de l’algorithme ROCOV-MUSIC.
– Le second objectif de cette partie fut l’étude de l’influence d’un bruit additif
impulsif sur les méthodes d’estimation non-paramétriques temps-fréquence.
• Procedure de pré-traitement du bruit impulsif : Dans une première
approche, nous avons appliqué la procédure de robustesse au sens minimax d’Huber contre l’effet du bruit impulsif sous forme d’une étape de prétraitement par deux techniques différentes à savoir :
1. La technique de compression des amplitudes par un filtre non-linéaire
de type |x|β ; 0 < β < 1
et,
2. La technique de troncature du signal en amplitude.
Par la suite nous représentons le signal dans le plan temps-fréquence en
utilisant des transformées quadratiques adéquates au cas multicomposantes
et un algorithme d’extraction des composantes afin d’estimer leurs fréquences
instantanées.
• Procedure basée sur une distribution temps-fréquence robuste :
Par contre dans la deuxième approche, nous avons combiné l’approche de
robustesse M-estimation avec les transformées temps-fréquence quadratiques
pour définir une classe de transformées robustes à l’effet du bruit impulsif
et des termes croisés d’un signal multicomposantes.
Une étude comparative par simulation montre l’avantage des méthodes tempsfréquence comparées aux méthodes paramétriques précédentes pour l’estimation
de signaux FM multi-composantes en présence de bruit impulsif. On peut souligner
aussi que les représentations robustes temps-fréquence peuvent servir à d’autres
applications qui sortent du cadre d’estimation des signaux FM traités dans ce
travail.
13.2
Perspectives
De nombreuses questions restent ouvertes :
[A]- Séparation de sources impulsives
• Comment améliorer les méthodes proposées en exploitant les techniques
d’analyse de performances existantes dans la littérature ?
• Une implémentation de type gradient est tout-à-fait possible pour optimiser
le critère de dispersion minimum proposé pour la séparation des sources
α-stables.
• Comment convergent les algorithmes proposés ?
• Comment choisir les non-linéarités par rapport aux distributions des sources
α-stables dans l’approche basée sur la décorrelation non-linéaire ?
• Comment étendre l’étude réalisée ici dans le cas des mélanges linéaires au
cas des mélanges non-linéaires ou convolutifs ?
184
• Un des problèmes, qui n’a pas été traité dans cette thèse, est celui du test
de la variance. En effet, ce test permettrait avant d’appliquer n’importe quel
algorithme de savoir si on est en présence d’un mélange à queue lourde de
variance infinie ou non. Sur cette question, il est à noter que certains travaux
ont déjà été développés dans la littérature de probabilités et statistiques et
qui peuvent être exploités dans notre contexte de séparation de sources.
• Comment peut-on combiner les trois classes de statistiques de second ordre,
celle d’ordre supérieur et celles d’ordre inférieur pour définir des critères
généraux de séparation de sources ?. Sur ce point on envisage de rajouter
aux critères de séparation basés sur la décorrelation non-linéaire un terme
d’ordre inférieur pour réduire l’effet d’”impulsivité” éventuelle des sources.
• Généralisation des approches basées sur l’utilisation des statistiques normalisées au cas des sources α-stables avec différents exponents caractéristiques
α. Pour ce faire, nous envisageons d’utiliser la procédure de déflation.
• Exploiter aussi les moments statistique logarithmiques définies à partir de
la fonction caractéristique de deuxième espèce.
• Creuser ce problème dans le cas sous determiné ; plus de sources que de
capteurs en exploitant la nature parcimonieuse des sources impulsives.
• Pour traiter cette dernière question, nous nous intéressons à la séparation
de sources dans le domaine transformée en ondelettes. Les aspects qui nous
intéressent en particulier sont : Le caractère impulsif des coefficients d’ondelettes et le caractère parcimonieux de la représentation en ondelettes. Cette
propriété de parcimonie a été récemment utilisée pour pouvoir séparer un
nombre de sources supérieur à celui des capteurs.
[B]- Traitement des signaux non-stationnaires multicomposantes
• L’analyse théorique des performances des approches proposées.
• La validation des méthodes proposées à travers leur application sur des signaux réels de type radar, sonar ou biomédicale.
• L’étude théorique de l’influence de la présence d’un bruit multiplicatif aléatoire.
• Exploration des méthodes de tests statistiques dans le plan temps-fréquence
pour l’extraction des composantes d’un signal non-stationnaires FM.
• Implémentation des algorithmes temps-fréquence en temps réel pour résoudre
des problèmes pratiques en communication. Cela consiste à développer des
nouvelles méthodes de gestion des services sumultaée en temps et en fréquence
dans les réseuax de communication.
• Approfondir l’analyse des méthodes existantes de classification dans le plan
temps-fréquence car cela peut résoudre plusieurs problèmes d’estimation et
de détection des signaux non-stationnaires.
• Exploiter au mieux la distribution de probabilité de la fréquence instantannée pour améliorer l’analyse temp-fréquence.
13.2 Perspectives
185
In the end is my beginning !
T.S. Eliot
♦
merci,
chokran,
thank you, chokran, merci, thank
you, chokran, merci,
thank
you,
chokran,
merci, thank you, chokran,
merci, thank you, chokran,
merci, thank you, chokran, merci,
thank you, chokran, merci,
thank you, chokran, merci,
thank
you
chokran,
merci, thank you chokran, merci, thank
you, chokran,
merci,
thank
you
♦
186
187
Bibliographie
[Abed-Meraim et Hua(1997)] Abed-Meraim, K. et Hua, Y. (1997). Joint schur
decomposition : Algorithms and applications. In Proceeding of First International
Conference on Information, Communications and Signal Processing , (supplement
proceedings ; ICICS’97), Singapore.
[Abed-Meraim et al.(1996)] Abed-Meraim, K., Bellouchrani, A., et Hua, Y. (1996).
Blind identification of a linear-quadratic mixture of independent component based on
joint diagonalization procedure. In Proc. of ICASSP’1996, Atlanta, USA.
[Abed-Meraim et al.(1997a)] Abed-Meraim, K., Qiu, W., et Hua, Y. (1997a). Blind
system identification. Proceedings of the IEEE, 85(8), 1310–1322.
[Abed-Meraim et al.(1997b)] Abed-Meraim, K., Loubaton, P., et Moulin, E. (1997b). A
subspace algorithm for certain blind identification problems. IEEE Trans. on
Information Theory, 43(2), 499–511.
[Abed-Meraim et al.(2000)] Abed-Meraim, K., Hua, Y., , et Ikram, M. Z. (2000). A fast
algorithm for conditional maximum likelihood blind identification of SIMO/MIMO
FIR systems. In Proc. EUSIPCO (invited paper).
[Abed-Meraim et al.(2001)] Abed-Meraim, K., Xiang, Y., Manton, J., , et Hua, Y.
(2001). Blind source separation using second order cyclostationary statistics. IEEE
Transaction on Signal Processing, 49(4), 694–701.
[Abed-Meraim et al.(2003)] Abed-Meraim, K., Nguyen, L., Sucis, V., Tupin, F., et
Boashash, B. (2003). An image processing approach for underdetermined blind
separation of nonstationary sources. In Proceeding of Int. Symp. on Sig. and Image
Proc. and Analysis, Rome.
[Adib et al.(2002)] Adib, A., Moreau, E., et Aboutajdine, D. (2002). A combined
contrast and reference signal based blind source separation by a deflation approach. In
Proceedings of the 2nd IEEE International Symposium on Signal Processing and
Information Technology (ISSPIT’2002), Marrakesh, Marocco.
[Adjrad et al.(2003)] Adjrad, M., Belouchrani, A., et Abed-Meraim, K. (2003).
Parameter estimation of multicomponent polynomial phase signals impinging on a
multi-sensor array using extended kalman filter. In Proceeding of (ISSPIT’2003),
Darmstadt, Germany.
[Adler et al.(1998)] Adler, R., Feldman, R. E., et Taqqu, M. (1998). A Practical Guide
to Heavy Tails : Statistical Techniques and Applications. Birkhauser, Boston.
[Akay et Erözden(2004)] Akay, O. et Erözden, E. (2004). Use of fractional
autocorrelation in efficient detection of pulse compression radar signals. In IEEE First
International Symposium on Control, Communications and Signal Processing, pages
33 – 36.
[Akgiray et Lamoureux(1989)] Akgiray, V. et Lamoureux, C. (1989). Estimation of
stable-law parameters : a comparative study. Journal of Business & Economic
Statistics, 7, 85–93.
188
BIBLIOGRAPHIE
[Altes(1980)] Altes, R. A. (1980). Detection, estimation, and classification with
spectrograms. The Journal of the Acoustical Society of America, 67(4), 1232–1246.
[Altinkaya et al.(2002)] Altinkaya, M. A., Delic, H., Sankur, B., et Anarim, E. (2002).
Subspace-based frequency estimation of sinusoidal signals in alpha-stable noise. Signal
Processing, 82, 1807–1827.
[Amari(1998)] Amari, S.-I. (1998). Natural gradient works efficiently in learning. Neural
Computation, 10(2), 251–276.
[Amari et Cardoso(1997)] Amari, S.-I. et Cardoso, J.-F. (1997). Blind source
separation—semiparametric statistical approach. IEEE Trans. on Signal Processing,
45(11), 2692–2700.
[Amari et al.(1996)] Amari, S.-I., Cichocki, A., et Yang, H. (1996). A new learning
algorithm for blind source separation. In Advances in Neural Information Processing
Systems 8, pages 757–763. MIT Press.
[Ambike et Hatzinakos(1995)] Ambike, S. et Hatzinakos, D. (1995). A new filter for
highly impulsive α-stable noise. In IEEE Workshop on Nolinear Signal and Image
Processing, Halkidiki, Greece.
[Amin(1992)] Amin, M. (1992). Time-Frequency Signal Analysis : Methods and
Applications. Longman-Chesire.
[Amin(1997)] Amin, M. G. (1997). Interference mitigation in spread spectrum
communication systems using time-frequency distributions. IEEE Transactions on
Signal Processing, 45(1), 90–101.
[Amin et Zhang(2000)] Amin, M. G. et Zhang, Y. (2000). Effects of cross-terms on the
performance of time–frequency MUSIC. In Proceedings of the 2000 IEEE Sensor
Array and Multichannel Signal Processing Workshop, pages 479–483.
[Amin et al.(1999)] Amin, M. G., Wang, C., et Lindsey, A. R. (1999). Optimum
interference excision in spread spectrum communications using open-loop adaptive
filters. IEEE Transactions on Signal Processing, 47(7), 1966–1976.
[Amin et al.(2000)] Amin, M. G., Belouchrani, A., et Zhang, Y. (2000). The spatial
ambiguity function and its applications. IEEE Signal Processing Letters, 7(6),
138–140.
[Andrews(1974)] Andrews, D. F. (1974). Scale mixtures of normal distributions. Journal
Royal Statistical Society, B 36, 99–102.
[Babaie-Zadeh et al.(2004)] Babaie-Zadeh, M., Mansour, A., Jutten, C., et Marvasti, F.
(2004). A geometric approach for separating several signals. In Fifth International
Symposium on Independent Component Analysis and Blind Signal Separation, pages
798–806, Granada, Spain.
[Babie-Zadeh(2002)] Babie-Zadeh, M. (2002). On Blind Source Separation in
Convolutive and Nonlinear Mixtures. Ph.D. thesis, INPG, Grenoble.
[Barbarossa(1995)] Barbarossa, S. (1995). Analysis of multicomponent LFM signals by a
combined Wigner-Hough transform. IEEE Transactions on Signal Processing, 43,
1511–1515.
[Barbarossa et Petrone(1997)] Barbarossa, S. et Petrone (1997). Analysis of polynomial
phase signals by an integrated generalized ambiguity function. IEEE Transaction on
[Barbarossa et Scaglione(1999a)] Barbarossa, S. et Scaglione, A. (1999a). Adaptive
time-varying cancellation of wideband interferences in spread-spectrum
communications based on time-frequency distributions. IEEE Transactions on Signal
Processing, 47(4), 957–965.
BIBLIOGRAPHIE
189
[Barbarossa et Scaglione(1999b)] Barbarossa, S. et Scaglione, A. (1999b). Optimal
precoding for transmissions over linear time-varying channels. In Seamless
Interconnection for Universal Services. GLOBECOM’99, volume 5, pages 2545–2549,
Piscataway, NJ.
[Barbarossa et Scaglione(2000)] Barbarossa, S. et Scaglione, A. (2000). Theoretical
bounds on the estimation and prediction of multipath time-varying channels. In
International Conference on Acoustics, Speech, and Signal Processing, ICASSP’2000,
volume 5, pages 2545–2548, Istanbul, Turkey.
[Barbarossa et al.(1997)] Barbarossa, S., Scaglione, A., Spalletta, S., et Votini, S. (1997).
Adaptive suppression of wideband interferences in spread-spectrum communications
using the Wigner-Hough transform. In International Conference on Acoustics, Speech,
and Signal Processing, ICASSP’97, volume 5, pages 3861–3864, California.
[Barkat(2000)] Barkat, B. (2000). Design, estimation, and performance of
time–frequency distributions. Ph.D. thesis, Queensland University of Technology,
Brisbane, Australia.
[Barkat(2001)] Barkat, B. (2001). Instantaneous frequency estimation of nonlinear
frequency–modulated signals in the presence of multiplicative and additive noise.
IEEE Transactions on Signal Processing, 49(10), 2214–2222.
[Barkat et Abed-Meraim(2003a)] Barkat, B. et Abed-Meraim, K. (2003a). Detection of
known FM signals in known heavy-tailed noise. In Proceeding of ISSPIT’2003,
Darmstadt, Germany.
[Barkat et Abed-Meraim(2003b)] Barkat, B. et Abed-Meraim, K. (2003b). An effective
technique for the IF estimation of FM signals in heavy-tailed noise. In Proceeding of
ISSPIT’2003, Germany.
[Barkat et Abed-Meraim(2004)] Barkat, B. et Abed-Meraim, K. (2004). Algorithms for
blind components separation and extraction from the time-frequency distribution of
their mixture. to appear in Journal of App. Sig. Proc.
[Barkat et Boashash(2001)] Barkat, B. et Boashash, B. (Oct. 2001). A high-resolution
quadratic time-frequency distribution for multicomponent signals analysis. IEEE
Transactions on Signal Processing, 49.
[Barkat et Stankovic(2004)] Barkat, B. et Stankovic, L. (2004). Analysis of polynomial
FM signals corrupted by heavy-tailed noise. Signal Processing, 84, 69–75.
[Barndorff(1998)] Barndorff, O. E. (1998). Processes of normal inverse gaussian type.
Finance and Stochastics, 2, 41–68.
[Barndorff-Nielsen(1997)] Barndorff-Nielsen, O. E. (1997). Normal inverse gaussian
distribution and stochastic volatility modeeling. Scandinavian Journal of Statistics,
24, 1–13.
[Barros(2000)] Barros, A. K. (2000). The independence assumption : Dependent
component analysis. In M. Girolami, editor, Advances in Independent Component
Analysis, pages 63–71. Springer-Verlag.
[Bassi et al.(1998)] Bassi, F., Embrechts, P., et Kafetzaki, M. (1998). Risk management
and quantile estimation. In R. E. F. R. Adler et M. Taqqu, editors, A practical guide
to heavy tails, pages 111–130. Birkhauser, Boston.
[Bell et Sejnowski(1995)] Bell, A. et Sejnowski, T. (1995). An information-maximization
approach to blind separation and blind deconvolution. Neural Computation, 7,
1129–1159.
[Bell(2000)] Bell, A. J. (2000). Information theory, independent component analysis,
and applications. In S. Haykin, editor, Unsupervised Adaptive Filtering, Vol. I, pages
237–264. Wiley.
190
BIBLIOGRAPHIE
[Belouchrani(2001)] Belouchrani, A. (2001). Blind source separation : Concepts,
approaches and applications. In ISSPA’2001 Tutorial, Kuala–Lumpur, Malaysia.
[Belouchrani et Amin(2000)] Belouchrani, A. et Amin, M. (2000). Jammer mitigation in
spread spectrum communications using blind source separation. Signal Processing, 80,
724–729.
[Belouchrani et Amin(1996)] Belouchrani, A. et Amin, M. G. (1996). A new approach
for blind source separation using time-frequency distributions. In Proceedings SPIE
conference on Advanced algorithms and Architectures for Signal Processing, Denver,
Colorado.
[Belouchrani et Amin(1997)] Belouchrani, A. et Amin, M. G. (1997). Blind source
separation using time–frequency distributions : Algorithm and asymptotic
performance. In IEEE Proc. ICASSP’97, pages 3469–3472, Germany.
[Belouchrani et Amin(1998)] Belouchrani, A. et Amin, M. G. (1998). Blind source
separation based on time-frequency signal representations. IEEE Transactions on
[Belouchrani et Amin(1999a)] Belouchrani, A. et Amin, M. G. (1999a).
Time–frequency : MUSIC. IEEE Signal Processing Letters, 6(5), 109–110.
[Belouchrani et Amin(1999b)] Belouchrani, A. et Amin, M. G. (1999b). A two–sensor
array beamformer for direct sequence spread spectrum communications. IEEE
Transactions on Signal Processing, 47(8), 2191–2199.
[Belouchrani et Cardoso(1994)] Belouchrani, A. et Cardoso, J.-F. (1994). Maximum
likelihood source separation for discrete sources. In Proceedings EUSIPCO.
[Belouchrani et cardoso(1995)] Belouchrani, A. et cardoso, J.-F. (1995). Maximum
likelihood source separation by the expectation-maximization technique : deterministic
and stochastic implementation. In In Proceeding of NOLTA, pages 49–53.
[Belouchrani et al.(1997a)] Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F., et
Moulines, E. (1997a). A blind source separation technique using second order
statistics. IEEE Trans. on Sig. Proc., pages 434–444.
[Belouchrani et al.(1997b)] Belouchrani, A., Abed-Meraim, K., et Cardoso, J.-F.
(1997b). An iterative blind source separation technique : Implementation and
performance. In Proceeding of International Conference on Information,
Communication and Signal Processing (ICICS’1997), Singapore.
[Belouchrani et al.(2001)] Belouchrani, A., Abed-Meraim, K., Amin, M. G., et Zoubir,
A. M. (2001). Joint anti-diagonalization for blind source separation. In International
Conference on Acoustics, Speech, and Signal Processing, ICASSP’2001, Salt Lake city,
Utah.
[Benidir(1994)] Benidir, M. (1994). Higher-Order Statistical Signal Processing, chapter
Theoretical foundations of higher-order statistical signal processing and polyspectra.
Longman Cheshire, Australia.
[Benidir(1997)] Benidir, M. (1997). Caracterization of polynomial functions and
application to time-frequency analysis. IEEE Trans. On Signal Processing, 45(5),
351–1354.
[Benidir(2002)] Benidir, M. (2002). Traitement du Signal, Tome 1. Dunod.
[Benidir(2003)] Benidir, M. (2003). Traitement du Signal, Tome 2. Dunod.
[Benidir et al.(2002)] Benidir, M., Ouldali, A., et Sahmoudi, M. (2002). Performances
analysis for the haf-estimator for a time-varying amplitude phase-modulated signals.
In The international IASTED Conference on Control and Applications (CA’2002),
Cancun, Mexico.
BIBLIOGRAPHIE
191
[Bergstrom(1952)] Bergstrom, H. (1952). On some expansions of stable distribution
functions. Arkiv Mathematik, 2, 375–378.
[Berlekamp(1968)] Berlekamp, E. R. (1968). Algebraic Coding Theory. McGraw-Hill,
New York.
[Bermond(2000)] Bermond, O. (2000). Statistical Methods for Blind Source Separation
(Méthodes statistiques pour la séparation de sources). Ph.D. thesis, ENST, Paris,
France.
[Besson et Castanié(1993)] Besson, O. et Castanié, F. (1993). On estimating the
frequency of a sinusoid in autoregressive multiplicative noise. Signal Processing, 30(1),
65–83.
[Besson et al.(1999)] Besson, O., Ghogho, N., et Swami, A. (1999). Parameter
estimation for random amplitude chirp signals. IEEE Transactions on Signal
Processing, 47(12), 3208–3219.
[Besson et al.(2000a)] Besson, O., Vincent, F., Stoica, P., et Gershman, A. B. (2000a).
Approximate maximum likelihood estimators for array processing in multiplicative
noise environments. IEEE Transactions on Signal Processing, 48(9), 2506–2518.
[Besson et al.(2000b)] Besson, O., Gini, F., Griffiths, H. D., et Lombardini, F. (2000b).
Estimating ocean surface velocity and coherence time using multichannel ATI-SAR
systems. Proceedings of the IEE : F, 147(6), 299–308.
[Bestravos et al.(1998)] Bestravos, A., Crovella, M., et Taqqu, M. (1998). Heavy-tailed
distributions in the world wide web. In R. E. F. R. Adler et M. Taqqu, editors, A
practical guide to heavy tails, pages 3–25. Birkhauser, Boston.
[Bhashyam et al.(2000)] Bhashyam, S., Sayeed, A. M., et Aazhang, B. (2000).
Time-selective signaling and reception for communication over multipath fading
channels. IEEE Transactions on Communications, 48(1), 83–94.
[Bircan et al.(1998)] Bircan, A., Tekinay, S., et Akansu, A. N. (1998). Time-frequency
and time-scale representation of wireless communication channels. In Proceedings of
the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis,
pages 373–376, Pittsburgh, Pennsylvania, USA.
[Blachman(1965)] Blachman, N. M. (1965). The convolution inequality for entropy
powers. IEEE Transaction on Information Theory, 11, 276–271.
[Boashash(1991)] Boashash, B. (1991). Time-frequency signal analysis. In S. Haykin,
editor, Advances in Spectrum Analysis and Array Processing, volume I, chapter 9,
pages 418–517. Prentice-Hall, Englewood Cliffs, New Jersey.
[Boashash(1992a)] Boashash, B. (1992a). Estimating and interpreting the instantaneous
frequency of a signal- Part 1 : Fundamentals. Proceedings of the IEEE, 80(4), 519–538.
[Boashash(1992b)] Boashash, B. (1992b). Estimating and interpreting the instantaneous
frequency of a signal- Part 2 : Algorithms and applications. Proceedings of the IEEE,
80(4), 539–569.
[Boashash(1992c)] Boashash, B., editor (1992c). Time-Frequency Signal Analysis :
Methods and Applications. Longman Cheshire, Melbourne, Australia.
[Boashash(1993)] Boashash, B. (1993). Recent advances in non-stationary signal
analysis : time-varying higher order spectra and multilinear time-frequency signal
analysis. In Proceedings of the SPIE - The International Society for Optical
Engineering, volume 2027, pages 2–26.
[Boashash(1996)] Boashash, B. (1996). Time frequency signal analysis : Past, present
and future trends. In C. T. Leondes, editor, Control and Dynamic Systems,
volume 48, pages 1–69. Academic Press, San Diego.
192
BIBLIOGRAPHIE
[Boashash(2002)] Boashash, B. (2002). Time Frequency Signal Analysis and Processing.
Prentice–Hall.
[Boashash et Jones(1992)] Boashash, B. et Jones, G. (1992). Instantaneous frequency
and time-frequency distributions. In B. Boashash, editor, Time-Frequency Signal
Analysis, chapter 2, pages 43–73. Longman Cheshire, Melbourne, Australia.
[Boashash et O’Shea(1993)] Boashash, B. et O’Shea, P. (1993). Use of the cross
Wigner-Wille distribution for estimation of instantaneous frequency. IEEE
[Boashash et O’Shea(1994)] Boashash, B. et O’Shea, P. (1994). Polynomial Wigner-Ville
distributions and their relationship to time-varying higher-order spectra. IEEE
Transactions on Signal Processing, 42, 216–220.
[Boashash et Ristic(1992)] Boashash, B. et Ristic, B. (1992). Robust radar algorithms.
Technical report, Signal Processing Research Centre, Queensland University of
Technology, Brisbane, Australia.
[Boashash et Ristic(1993a)] Boashash, B. et Ristic, B. (1993a). Analysis of FM signals
affected by Gaussian AM using reduced Wigner–Ville trispectrum. In International
Conference on Acoustics, Speech, and Signal Processing, ICASSP’93, volume IV,
pages 408–411, Minneapolis.
[Boashash et Ristic(1993b)] Boashash, B. et Ristic, B. (1993b). Application of cumulant
TVHOS to the analysis of composite FM signals in multiplicative and additive noise.
In F. T. Luk, editor, Proceedings of SPIE, Advanced Signal Processing Algorithms,
Architectures and Implementations, volume 2027, pages 245–255, San Diego.
[Boashash et Ristic(1993c)] Boashash, B. et Ristic, B. (1993c). Polynomial
time-frequency distributions and time-varying polyspectra. Technical report, Signal
Processing Research Centre, Queensland University of Technology, Brisbane,
Australia.
[Boashash et Ristic(1995)] Boashash, B. et Ristic, B. (1995). A time-frequency
perspective of higher-order spectra as a tool for non-stationary signal analysis. In
B. Boashash, E. J. Powers, et A. M. Zoubir, editors, Higher Order Statistical Signal
Processing, chapter 4, pages 111–149. Longman, Australia.
[Boashash et Ristic(1998)] Boashash, B. et Ristic, B. (1998). Polynomial time-frequency
distributions and time-varying higher order spectra : Application to the analysis of
multicomponent FM signals and to the treatment of multiplicative noise. Signal
Processing, 67, 1–23.
[Boashash et Rodriguez(1984)] Boashash, B. et Rodriguez, F. (1984). Recognition of
time-varying signals in the time-frequency domin by means of the wigner distribution.
In Proc. of ICASSP’1984, San Diego, USA.
[Boashash et Sucic(2002)] Boashash, B. et Sucic, V. (2002). High performance
time–frequency distributions for practical applications. In L. Debnath, editor,
Wavelets and Signal Processing. Birkhauser, Boston, New York : Springer–Verlag.
[Boashash et Sucic(2003)] Boashash, B. et Sucic, V. (2003). Resolution measure criteria
for the objective assessement of the performance of quadratic time-frequency
distributions. IEEE Trans. on Signal Processing, 51(5), 1253–1263.
[Boashash et al.(1995)] Boashash, B., Powers, E. J., et Zoubir, A. M., editors (1995).
Higher Order Statistical Signal Processing. Longman, Australia.
[Bodenschatz et Nikias(1999)] Bodenschatz, J. S. et Nikias, C. L. (1999). Maximum
likelihood symmetric α-stable parameter estimation. Trans. on signal Processing,
47(5).
BIBLIOGRAPHIE
193
[Boscolo et al.(2004)] Boscolo, R., Pan, H., et Roychowdhury, V. P. (2004). Independent
Component Analysis Based on Nonparametric Density Estimation. IEEE Transaction
on Neural Networks, 15(1).
[Boudreaux-Bartels et Marks(1986)] Boudreaux-Bartels, G. F. et Marks, T. W. (1986).
Time-varying filtering and signal estimation using Wigner distributions. IEEE
Transactions on Acoustics, Speech, and Signal Processing, 34, 422–430.
[Box(1953)] Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika,
(40), 318–335.
[Brcich et Zoubir(2002)] Brcich, R. F. et Zoubir, A. M. (2002). Robust estimation with
parametric score function estimation. In Proceedings of the ICASSP’2002 IEEE
Conference, pages 1149–1152.
[Cambanis et Miller(1981)] Cambanis, S. et Miller, G. (1981). Linear problems in pth
order and stable processes. SIAM J. Appl. Math., 41, 43–49.
[Cao et Murata(1999)] Cao, J. et Murata, N. (1999). A Stable and Robust ICA
Algorithm Based on T-Distribution and Generalized Gaussian Distribution Model. In
Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for
Signal Processing IX, pages 283 – 292.
[Cappé et al.(2002)] Cappé, O., Moulines, E., Pesquet, J.-C., et A. Petropulu, X. Y.
(2002). Long-range dependence and heavy-tail modeling for teletraffic data. IEEE
Signal Processing Magazine, pages 14–27.
[Cardoso(1989a)] Cardoso, J.-F. (1989a). Blind identification of independent signals. In
Proc. Workshop on Higher-Order Specral Analysis, Vail, Colorado.
[Cardoso(1989b)] Cardoso, J.-F. (1989b). Source separation using higher order
moments. In Proceeding of IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP’89), pages 2109–2112, Glasgow, UK.
[Cardoso(1991)] Cardoso, J.-F. (1991). Super-symmetric decomposition of the
fourth-order cumulant tensor. blind identification of more sources than sensors. In
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’97),
pages 3109–3112.
[Cardoso(1998)] Cardoso, J. F. (1998). Blind signal separation : statistical principles.
Proc. of the IEEE, 9(10), 2009–2025.
[Cardoso(1999)] Cardoso, J.-F. (1999). High-order contrasts for independent component
analysis. Neural Computation, 11(1), 157–192.
[Cardoso et Comon(1996)] Cardoso, J.-F. et Comon, P. (1996). Independent component
analysis, a survey of some algebraic methods. In Proc. ISCAS’96, volume 2, pages
93–96.
[Cardoso et Lahed(1996)] Cardoso, J. F. et Lahed, B. (1996). Equivariant adaptive
source separation. IEEE Transaction on Signal Procssing, 44, 3017–3030.
[Cardoso et Souloumiac(1993)] Cardoso, J. F. et Souloumiac, A. (1993). Blind
beamforming for non-gaussian signals. Radar and Signal Processing, IEE Proceedings
F.
[Castella et Pesquet(2004)] Castella, M. et Pesquet, J. C. (2004). An iterative source
separation method for convolutive mixtures of images. In Proceedings of the
International Conference on Independent Component Analysis (ICA’2004), pages
922–929.
[Castella et al.(2004)] Castella, M., Bianchi, P., Chevreuil, A., et Pesquet, J.-C. (2004).
Blind MIMO detection of convolutively mixed CPM sources. In Proceeding of
EUSIPCO’2004, Vienna, Austria.
194
BIBLIOGRAPHIE
[Celka et al.(2001)] Celka, P., Boashash, B., et Colditz, P. (2001). Preprocessing and
time-frequency analysis of newborn eeg seizures. IEEE Engineering in Medicine &
Biology Magazine, 20, 30–39.
[Chabert et al.(2003)] Chabert, M., Tourneret, J.-Y., et Coulon, M. (2003). Joint
detection of variance changes using hierarchical Bayesian analysis. In Proceeding of the
IEEE International workshop on Statistical Signal Processing, Saint-Louis, Missouri,
USA.
[Chambers et al.(1976)] Chambers, J. M., Mallows, C. L., et Stuck, B. W. (1976). A
method for simulating stable random variables. Journal of the American Statistical
Association, 71(354), 340–344.
[Chen et Bickel(2003)] Chen, A. et Bickel, P. J. (2003). Efficient Independent
Component Analysis. Departement of Statistics, University of California, Berkele,
Technical report 634.
[Chen et Bickel(2004)] Chen, A. et Bickel, P. J. (2004). Robustness of prewhitening
against heavy-tailed sources. In Proceeding of the Fifth International Conference on
Independent component Analysis and Blind Signal Separation (ICA’2004), Granada,
Spain.
[Choi et Williams(1989)] Choi, H. et Williams, W. (1989). Improved time–frequency
representation of multicomponent signals using exponential kernels. IEEE
[Cichocki et Amari(2002)] Cichocki, A. et Amari, S. (2002). Adaptive Blind Signal and
Image Processing. John Wiley & Sons, Singapore.
[Cichocki et Unbehauen(1996)] Cichocki, A. et Unbehauen, R. (1996). Robust neural
networks with on-line learning for blind identification and blind separation of sources.
IEEE Trans. on Circuits and Systems, 43(11), 894–906.
[Cichocki et al.(1994)] Cichocki, A., Unbehauen, R., et Rummert, E. (1994). Robust
learning algorithm for blind separation of signals. Electronics Letters, 30(17),
1386–1387.
[Cichocki et al.(2004)] Cichocki, A., Li, Y., et ans S. I. Amari, P. G. (2004). Beyond
ICA : Robust sparse signal representation. In Proceedings of the IEEE International
Symposium on Circuits and Systems (ISCAS ’04), volume 5, pages 684 – 687.
[Classen et Mecklenbrauker(1980)] Classen, T. et Mecklenbrauker, W. (1980). The
wigner distribution –part 1. Phillips Journal of Research, 35, 217–250.
[Cline et Brockwell(1985)] Cline, D. B. et Brockwell, P. (1985). Linear prediction of
ARMA processes with infinite variance. Stoch. Processes & Applications, 19, 281–296.
[Cohen(1966)] Cohen, L. (1966). Generalized phase-space distribution functions.
Journal of Mathematical Physics, 7(5), 781–786.
[Cohen(1992)] Cohen, L. (1992). What is a Multicomponent Signal.
[Cohen(1995)] Cohen, L. (1995). Time-frequency Analysis. Prentice-Hall.
[Comon(1989)] Comon, P. (1989). Separation of stochastic processes. In Proc. Workshop
on Higher-Order Specral Analysis, pages 174 – 179, Vail, Colorado.
[Comon(1994)] Comon, P. (1994). Independent component analysis, a new concept.
Signal Processing, 36, 287–314.
[Cook et Bernfeld(1993)] Cook, C. E. et Bernfeld, M. (1993). Radar Signals : An
Introduction to Theory and Application. Artech House, Norwood, MA.
[Cook et al.(1993)] Cook, D., Buja, A., et Cabrera, J. (1993). Projection pursuit indexes
based on orthonormal function expansions. J. of Computational and Graphical
Statistics, 2(3), 225–250.
BIBLIOGRAPHIE
195
[Coulon et Tourneret(1999)] Coulon, M. et Tourneret, J. (1999). Multiple frequency
estimation in additive and multiplicative colored noises. In Proceeding of
ICASSP’1999, pages 1573–1576, Phoenix, USA.
[Cover et Thomas(1991)] Cover, T. M. et Thomas, J. A. (1991). Elements of
Information Theory. Wiely Series in Telecommunications.
[Crespo et al.(1995)] Crespo, P. M., Honig, M. L., et Salehi, J. A. (1995). Spread-time
code-division multiple access. IEEE Transactions on Communications, 43(6),
2139–2147.
[Davy et al.(2002)] Davy, M., Doncarli, C., et Tourneret, J.-Y. (2002). Classification of
chirp signals using hierarchical Bayesian learning and MCMC methods. IEEE Trans.
on Signal Proc., 50(2), 377 – 388.
[de Boor(1978)] de Boor, C. (1978). A practical guide to splines. Springer-Verlag, New
York, applied mathematical sciences edition.
[Delmas(2004)] Delmas, J. (2004). Asymptotically optimal estimation of DOA for
non-circular sources from second-order moments. IEEE Trans. on Signal Processing,
pages 1235–1245.
[Delmas(1997)] Delmas, J. P. (1997). An extension to the EM algorithm for exponential
family. IEEE Trans. on Signal Processing, 4(10), 2613–2615.
[Delmas et al.(2000)] Delmas, J. P., Gazzah, H., Liavas, A. P., et Regalia, P. A. (2000).
Statistical analysis of some seconde order methods for blind channel identification
equalization with respect to channel undemodeling. IEEE Trans. on Signal
Processing, 48(7), 1984–1998.
[Delyon et al.(1999)] Delyon, B., lavielle, M., et Moulin, E. (1999). Convergence of a
stochastic approximation version of the EM algorithm. Ann. Statist., 27(1), 94–128.
[Dempster(1977)] Dempster, E. J. (1977). Maximum likelihood from incomplete data
via EM algorithm. Annals Royal Statistical Society, 39, 1–38.
[d’Estamps(2003)] d’Estamps, L. (Oct. 2003). Traitement Statistique des Processus
Alpha-Stables : mesure de dépendance et identification des AR Stables. Ph.D. thesis,
Institut National Polytechnique de Toulouse, Toulouse, France.
[Diebolt et Celeux(1993)] Diebolt, J. et Celeux, G. (1993). Asymptotic properties of a
stochastic EM algorithm for estimating mixing proportions. Comm. Statist. Stochastic
Models, 9(4), 599–613.
[Djafari(1999)] Djafari, A. M. (1999). A Bayesian approach to source separation. In AIP
Conference Proceedings 567, Maximum Entropy and Bayesian Methods, pages
221–244, Boise, Idaho, USA.
[Djeddi et Benidir(2004)] Djeddi, M. et Benidir, M. (2004). Robust Polynomial
Wigner-Ville Distribution For The Analysis of Polynomial Phase Signals in α-Stable
Noise. In Proceedings of the IEEE Conference ICASSP’2004.
[Djuric et Kay(1990)] Djuric, P. C. et Kay, S. M. (1990). Parameter estimation of chirp
signals. IEEE Trans. Acoust., Speech, Signal Processing, 38(12), 2118–2126.
[DuMouchel(1973)] DuMouchel, W. H. (1973). On the asymptotic normality of the
maximum likelihood estimate one sampling from a stable distribution. Annals of
statistics, 1, 948–957.
[E. Moreau(1997)] E. Moreau, J.-C. P. (1997). Independence/decorrelation measures
with applications to optimized orthonormal representations. In IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP’97), volume 5.
[El-Hassouni et Cherifi(2003)] El-Hassouni, M. et Cherifi, H. (2003). A 2-d Adaptive
Least lp -Norm Filter For Impulsive Noise Cancellation in Still Images. In Proceeding
of ISPA’2003, Paris, France.
196
BIBLIOGRAPHIE
[Elliott(1938)] Elliott, R. (1938). The wave principle. Collins, New York.
[Erdogmus et al.(2002)] Erdogmus, D., Rao, Y. N., Principe, J. C., Zaohao, J., et
Hild-II, K. E. (2002). Simultaneous extraction of principal components using givens
rotations and output variances. In ICASSP’2002, pages 1069–1072.
[Eriksson et Koivanen(2003)] Eriksson, J. et Koivanen, V. (2003).
Characteristic-function based independent component analysis. Signal Processing, 83,
2195–2208.
[et al(2004)] et al, R. B. (2004). Independent component analysis based on
nonparametric density estimation. IEEE Trans. on Neural Networks, 15(1).
[Even(2003)] Even, J. (Déc. 2003). Contributions a la Separation de Sources à l’aide de
Statistiques d’Ordre. Ph.D. thesis, Université Joseph Fourier Gronoble, Gronoble,
France.
[Fama(1965)] Fama, E. F. (1965). The behavior of stock market price. Journal of
Business, 38, 34–195.
[Fama et Roll(1968)] Fama, E. F. et Roll, R. (1968). Some properties of symmetric
stable distributions. Journal of the American Statistical Association, 63, 817–836.
[Fama et Roll(1971)] Fama, E. F. et Roll, R. (1971). Parameter estimates for symmetric
stable distributions. Journal of the American Statistical Association, 66, 817–836.
[FastICA(1998)] FastICA (1998). The FastICA package for MATLAB. Available at
http ://www.cis.hut.fi/projects/ica/fastica/.
[Feller(1966)] Feller, W. (1966). An Introduction to Probability Theory and its
Applications, volume 1. John Wiley.
[Feller(1971)] Feller, W. (1971). An introduction to probability theory and its
applications, Vol. II. John Wiley&Sons, 2nd edition.
[Fevotte et Doncarli(2004)] Fevotte, C. et Doncarli, C. (2004). Two contributions to
blind source separation using time-frequency distributions. IEEE Signal Processing
Letters, 11.
[Flandrin(1988a)] Flandrin, P. (1988a). A time-frequency formulation of optimum
detection. International Conference on Acoustics, Speech, and Signal
Processing, ICASSP’98, 36(9), 1377–1384.
[Flandrin(1988b)] Flandrin, P. (1988b). A time-frequency formulation of optimum
detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(9),
1377–1384.
[Flandrin(1993)] Flandrin, P. (1993). Temps-fréquence. Hermes, Paris.
[Flandrin(1998)] Flandrin, P. (1998). Time-Frequency / Time-Scale Analysis, Volume
10. Academic Press.
[Fonollosa et Nikias(1994)] Fonollosa, J. R. et Nikias, C. L. (1994). Analysis of
finite-energy signals using higher-order moments-and spectra-based time-frequency
distributions. Signal Processing, 36, 315–328.
[Francos et Porat(1999)] Francos, A. et Porat, M. (1999). Analysis and synthesis of
multicomponent signals using positive time-frequency distributions. IEEE
[Francos et Friedlander(1995)] Francos, J. et Friedlander, B. (1995). Bounds for
estimation of multicomponent signals with random amplitude and deterministic
phase. IEEE Transactions on Signal Processing, 43(5), 1161–1172.
[Frechet(1924)] Frechet, M. (1924). Sur la loi des erreurs d’observation. Mtematicheskii
Sbornik, (32), 1–8.
BIBLIOGRAPHIE
197
[Freedman et Diaconis(1982)] Freedman, D. A. et Diaconis, P. (1982). On inconsistent
M-estimators. The Annals of Statistics, 10(2), 454–461.
[Friedlander et Francos(1995)] Friedlander, B. et Francos, J. (1995). Estimation of
amplitude and phase parameters of multicomponent signals. IEEE Transactions on
[Friedmann et al.(2000)] Friedmann, J., Messer, H., et Cardoso, J.-F. (April 2000).
Robust parameter estimation of a deterministic signal in impulsive noise. IEEE, Tr.
On Sig. Proc., 48(4).
[Gaarder(1968)] Gaarder, N. T. (1968). Scattering function estimation.
[Gallagher(2000)] Gallagher, C. M. (2000). Estimating the autocovariation from
stationary heavy-tailed data, with applications to time series. Rapport technique,
Clemson University.
[Gallagher(2001)] Gallagher, C. M. (2001). A method for fitting stable autoregressive
models using the autocovariation function. Statistics & Probability Letters, 53(4),
381–390.
[Gallagher(2002)] Gallagher, C. M. (2002). Testing for linear dependence in heavy-tailed
data. Communication in Statistics, Theory and Methods, 31(4), 611–623.
[Gauss(1963)] Gauss, C. F. (1963). Theory of Motion of the Heavenly Bodies. Dover,
New York.
[Gazzah et Abed-Meraim(2003)] Gazzah, H. et Abed-Meraim, K. (2003). Blind
sos-based zf equalization with controlled delay robust to order over estimation.
Journal of Applied Signal Processing (IEE JASP).
[Georgiadis(2000)] Georgiadis, A. (Sept. 2000). Adaptive Equalisation for Impulsive
Noise Environments. Ph.D. thesis, The University of Edinburgh, Edinburgh, UK.
[Ghogho et al.(1999)] Ghogho, M., Nandi, A. K., et Swami, A. (1999). Cramer-Rao
bounds and maximum likelihood estimation for random amplitude phase–modulated
signals. IEEE Transactions on Signal Processing, 47(11), 2905–2916.
[Ghogho et al.(2001)] Ghogho, M., Swami, A., et Durrani, T. S. (2001). Frequency
estimation in the presence of Doppler spread : performance analysis. IEEE
[Gnedenko et Kolmogorov(1111)] Gnedenko, B. V. et Kolmogorov, A. N. (1111). Limit
Distributions for Sums of Independent Random Variables. Addison-Wesley.
[Godsil(1999)] Godsil, S. (1999). MCMC and EM-based methods for inference in
heavy-tailed processes with α-stable innovations. In Proceedings of the IEEE
Statistical Signal Processing Workshop.
[Gonin et Money(1985)] Gonin, R. et Money, A. H. (1985). Nonlinear lp -norm
estimation : Part 1. on the choice of the exponent, p, where the errors are additive.
Commun. Stat. Theory Methods A, 14, 827–840.
[Grenier(1984)] Grenier, Y. (1984). Modélisation de Signaux non Stationnaires. Ph.D.
thesis, Université Paris Sud.
[Griffith(1997)] Griffith, D. W. (1997). Robust-Time Frequency Representations for
Signals in Alpha-Stable Noise : Methods and Applications. Ph.D. thesis, Departement
of Electrical Engineering, University of Delaware, Newark.
[Grigoriu(1995)] Grigoriu, M. (1995). Applied Non-Gaussian Processes. Prentice-Hall.
[H. Hassanpour et Boashash(2003)] H. Hassanpour, M. M. et Boashash, B. (2003).
Comparative performance of time-frequency based newborn EEG seizure detection
using spike signature. In ICASSP’2003, volume 2, pages 389–392.
198
BIBLIOGRAPHIE
[Haas et Belfiore(1997)] Haas, R. et Belfiore, J.-C. (1997). A time-frequency
well-localized pulse for multiple carrier transmission. Wireless Personal
Communications, 5, 1–18.
[Hall(1966)] Hall, H. M. (1966). A new model for impulsive phenomena : Application to
atmospheric-noise communication channels. Technical Report 3412-8, 7050-7, Stanford
Electronics Laboratories, Stanford University, Stanford, California. This paper
introduce the Student-t distribution.
[Hampel et al.(1986)] Hampel, F. R., Ronchetti, E., Rousseeuw, P. J., et Stahel, W. A.
(1986). Robust Statistics : The Approach Based on Influence Functions. Wiley.
[Hanssen et Oigard(2001)] Hanssen, A. et Oigard, T. A. (2001). The normal inverse
Gaussian distribution as a flexible model for heavy-tailed processes. In Proceeding of
NSIP.
[Hérault et Ans(1984)] Hérault, J. et Ans, B. (1984). Circuits neuronaux à synapses
modifiables : décodage de messages composites par apprentissage non supervisé. C.-R.
de l’Académie des Sciences, 299(III-13), 525–528.
[Hérault et al.(1985)] Hérault, J., Jutten, C., et Ans, B. (1985). Détection de grandeurs
primitives dans un message composite par une architecture de calcul neuromimétique
en apprentissage non supervisé. In Actes du Xème colloque GRETSI, pages
1017–1022, Nice, France.
[Hlawatsch(1998)] Hlawatsch, F. (1998). Time-Frequency Analysis and Synthesis of
Linear Signal Spaces : Time-Frequency Filters, Signal Detection and Estimation, and
Range-Doppler Estimation. Kluwer Academic Publishers, USA.
[Hlawatsch et Boudreaux-Bartels(1992)] Hlawatsch, F. et Boudreaux-Bartels, G. F.
(1992). Linear and quadratic time-frequency signal representations. IEEE Signal
Processing Magazine, 9(2), 21–67.
[Hlawatsch et Krattenthaler(1997)] Hlawatsch, F. et Krattenthaler, W. (1997). Signal
synthesis algorithms for bilinear time-frequency signal representations. In
W. Mecklenbräuker et F. Hlawatsch, editors, The Wigner Distribution - Theory and
Applications in Signal Processing, pages 135–209. Elsevier, Amsterdam, Netherlands.
[Hlawatsch et Matz(1998)] Hlawatsch, F. et Matz, G. (1998). Time-frequency signal
processing : A statistical perspective. In Proc. IEEE Workshop on Circuits, Systems
and Signal Processing, pages 207–219, Mierlo, The Netherlands.
[Hlawatsch et Matz(2000)] Hlawatsch, F. et Matz, G. (to appear in September 2000).
Quadratic time-frequency analysis of linear time-varying systems. In L. Debnath,
editor, Wavelet Transforms and Time-Frequency Signal Analysis, chapter 9.
Birkhäuser, Boston (MA).
[Hlawatsch et al.(2000)] Hlawatsch, F., Matz, G., Kirchauer, H., et Kozek, W. (2000).
Time-frequency formulation, design, and implementation of time-varying optimal
filters for signal estimation. IEEE Transactions on Signal Processing, 48. to appear.
[Huber(1972)] Huber, J. P. (1972). Robust statistics : A review. Ann. Math. Statist., 43,
1041–1067.
[Huber(1985)] Huber, P. (1985). Projection pursuit. The Annals of Statistics, 13(2),
435–475.
[Huber(1981)] Huber, P. J. (1981). Robust Statistics. Wiley, New York.
[Hussain(2002)] Hussain, Z. M. (2002). Adaptive instantaneous frequency estimation :
Techniques and algorithms. Ph.D. thesis, Queensland University of Technology,
Brisbane, Australia.
[Hussain et Boashash(2002)] Hussain, Z. M. et Boashash, B. (2002). Adaptive
instantaneous frequency estimation of multicomponent fm signals using quadratic
time-frequency distributions. IEEE Trans. on Signal Proc., pages 1866–1876.
BIBLIOGRAPHIE
199
[Hyvärinen(1997)] Hyvärinen, A. (1997). One-unit contrast functions for independent
component analysis : A statistical analysis. In Neural Networks for Signal Processing
VII (Proc. IEEE Workshop on Neural Networks for Signal Processing), pages
388–397, Amelia Island, Florida.
[Hyvarinen(1998)] Hyvarinen, A. (1998). New approximation of differential entrpy for
independent component analysis and projection pursuit. In advances in Neural
Information Processing Systems, 10, 273–279.
[Hyvarinen(1999)] Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for
independent component analysis. IEEE Trans. on Neural Networks, 10(3), 626 – 634.
[Hyvarinen et al.(2001)] Hyvarinen, A., Karhunen, J., et Oja, E. (2001). Independent
Component Analysis. Wiley.
[Ichir et M.-Djafari(2003)] Ichir, M. et M.-Djafari, A. (2003). Bayesian wavelet based
signal and image separation. In AIP Conference Proceedings of Maxent23 ; Maximum
Entropy and Bayesian Inference Methods, pages 417–428, American Institute of
Physics, Jackson Hole, Wyoming, USA.
[Ikram et Zhou(2001)] Ikram, M. Z. et Zhou, G. T. (2001). Estimation of
multicomponent polynomial phase signals of mixed orders. Signal Processing, 81,
2293–2308.
[Ikram et al.(1996a)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996a). Estimating
doppler parameters in SAR imaging for moving targets. In Proceeding of the IEEE
Nordic Signal Processing Symposium (NORSIG), pages 207–210, Espoo, Finlande.
[Ikram et al.(1996b)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996.b). Fast discrete
quadratic phase transform for estimating the parameters of chirp signals. In Proc. of
the 30th Asilomar Conference, CA, volume 1, pages 798–801.
[Ikram et al.(1996c)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996c). An iterative
approach to the parametric estimation of chirp signals. In IEEE Region Ten
Conference, Perth, Australia, volume 2, pages 681–685.
[Ikram et al.(1997)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1997). Fast quadratic
phase transform for estimating the parameters of multicomponent chirp signals. DSP
Review Journal, pages 127–135.
[Ikram et al.(1998)] Ikram, M. Z., Belouchrani, A., Abed-Meraim, K., et Gesbert, D.
(1998). Parametric estimation and suppression of non-stationary interference in
spread spectrum communications. In Proc. of 32nd Asilomar Conference on Signals,
Systems and Computers, Pacific Grove, CA, pages 1401–1405.
[Ilow(1995)] Ilow, J. (1995). Signal Processing in α-stable Noise Environments : Noise
Modeling, Detection and Estimation. Ph.D. thesis, Dept. of Electrical and Computer
Engineering, University of Toronto, Toronto, Canada.
[Jain(1989)] Jain, A. K. (1989). Fundamentals of Digital Image Processing.
Prentice-Hall, Englewood Cliffs, New York.
[Jakes(1974)] Jakes, W., editor (1974). Microwave Mobile Communications. IEEE Press.
[Janicki et Weron(1994)] Janicki, A. et Weron, A. (1994). Simulation and Chaotic
Behavior of α-Stable Stochastic Processes. Marcel Dekker, New York.
[Jayant et Noll(1984)] Jayant, N. et Noll, P. (1984). Digital Coding of Waveforms :
Principles and Applications to Speech and Video. Prentice-Hall.
[Jones et Sibson(1987)] Jones, M. C. et Sibson, R. (1987). What is projection pursuit ?
J. of the Royal Statistical Society, 150(A), 1–36.
[Joshi et Morris(1998)] Joshi, S. M. et Morris, J. M. (1998). Multiple access based on
Gabor transform. In Proceedings of the IEEE-SP International Symposium on
Time-Frequency and Time-Scale Analysis, pages 217–220, Pittsburgh, Pennsylvania,
USA. IEEE.
200
BIBLIOGRAPHIE
[Jutten(2000)] Jutten, C. (2000). Source separation : from dusk till dawn. In Proc. 2nd
Int. Workshop on Independent Component Analysis and Blind Source Separation
(ICA’2000), pages 15–26, Helsinki, Finland.
[Kagan et al.(1973)] Kagan, A., Linnik, Y., et Rao, C. (1973). Characterization
Problems in Mathematical Statistics. John Wiley & Sons, USA.
[Kalluri(1998)] Kalluri, S. (1998). Nonlinear Adaptive Optimization Algorithms for
Robust Signal Processing in Non-Gaussian Environments. Ph.D. thesis, Dept. of
Electrical Engineering, University of Delaware, Newark.
[Kaluri et Arce(2000)] Kaluri, S. et Arce, G. (2000). Fast algorithms for weighted
myriad algorithm computation by fixed point search. IEEE Trans. on Signal Proc.
[Karol et al.(1997)] Karol, M. J., Haas, Z. J., Woodworth, C. B., et Gitlin, R. D. (1997).
Time-frequency-code slicing : efficiently allocating the communications spectrum to
multirate users. IEEE Transactions on Vehicular Technology, 46(4), 818–826.
[Karvanen et Cichocki(2003)] Karvanen, J. et Cichocki, A. (2003). Measuring sparseness
of noisy signals. In Proc. of the Conference ICA’2003, Japan.
[Kassam(1995)] Kassam, S. A. (1995). Signal Detection in Non-Gaussian Noise. John
Wily & Sons, New York.
[Kassam et Poor(1985)] Kassam, S. A. et Poor, V. (1985). Robust techniques for signal
processing : A survey. Proceedings of the IEEE, 73(3), 433–481.
[Katkovnik(1998)] Katkovnik, V. (1998). Robust m-periodogram. IEEE Transaction on
[Katkovnik et Stankovic(1998)] Katkovnik, V. et Stankovic, L. J. (1998). Instantaneous
frequency estimation using the Wigner distribution with varying and data driven
window length. IEEE Transactions on Signal Processing, 46(9), 2315–2325.
[Katkovnik et al.(2002)] Katkovnik, V., Djurovic, I., et Stankovic, L. (2002).
Time-Frequency Signal Analysis, chapter Robust time-frequency representations.
Prentice-Hall.
[Katkovnik et al.(2003)] Katkovnik, V., Djurovic, I., et Stankovic, L. (2003). Robust
time-frequency representation. Elsevier, Oxford.
[Kay(1998a)] Kay, S. (1998a). Fundamentals of Statistical Signal Processing : Detection
Theory. Prentice-Hall, Englewood Cliffs.
[Kay(1998b)] Kay, S. (1998b). Fundamentals of Statistical Signal Processing :
Estimation Theory. Prentice-Hall, Englewood Cliffs.
[Kay(1993)] Kay, S. M. (1993). Fundamentals of Statistical Signal Processing :
Estimation Theory. A.V. Oppenheim, series editor, Prentice-Hall Signal Processing
Series. Prentice-Hall, Englewood Cliffs, New Jersey.
[Kay(1998c)] Kay, S. M. (1998c). Fundamentals of Statistical Signal Processing, Volume
II : Detection Theory. A.V. Oppenheim, series editor, Prentice-Hall Signal Processing
Series. Prentice-Hall.
[Kay et Boudreaux-Bartels(1985)] Kay, S. M. et Boudreaux-Bartels, G. F. (1985). On
the optimality of Wigner distribution for detection. In International Conference on
Acoustics, Speech, and Signal Processing, ICASSP’85, pages 1017–1019.
[Khawarizmi(ecle)] Khawarizmi, M. I. M. (IX-ème siècle). The algebra of Mohammed
ben Musa. Edited band translated by Frederic Rosen. Georg Olms. Verlag.
[Kidmose(2001)] Kidmose, P. (2001). Blind Separation of Heavy Tail Signals. Ph.D.
thesis, Technical University of Denmark, Lyngby, Denmark.
[Knuth(1999)] Knuth, K. H. (1999). A Bayesian approach to source separation. In
Proceeding of the first International Workshop on Independent Component Analysis
and Signal Separation (ICA’1999), pages 283–288, Aussios, France.
BIBLIOGRAPHIE
201
[Kootsookos et al.(1992)] Kootsookos, P., Lovell, B., et Boashash, B. (1992). A unified
approach to the STFT, TFDs, and instantaneous frequency. IEEE Transactions on
Signal Processing, 40, 1971 – 1982.
[Koutrouvelis(1980)] Koutrouvelis, I. A. (1980). Regression-type estimation of the
parameters of stable laws. Journal of the American Statistical Association, 75(372),
918–928.
[Krim et Viberg(1996)] Krim, H. et Viberg, M. (1996). Two decades of array signal
processing research : the parametric approach. IEEE Signal Processing Magazine,
13(4), 67 – 94.
[Krob et Benidir(1993)] Krob, M. et Benidir, M. (1993). Blind idenntification of a
linear-quadratic model using higher-order statistics. Minneapolis, USA.
[Kuelbs(1973)] Kuelbs, J. (1973). A representation theorem for symmetric stable
processes and stable measures. H. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete,
26.
[Kuhn et Lavielle(2004)] Kuhn, E. et Lavielle, M. (2004). Coupling a stochastic
approximation version of EM with a MCMC procedure. ESAIM Proba.& Stat., 8,
115–131.
[Kuruoglu(1998)] Kuruoglu, E. (1998). Signal Processing in α-stable Noise
Environments : A least lp Approach. Ph.D. thesis, University of Cambridge, UK.
[Kuruoglu(2001)] Kuruoglu, E. E. (2001). Density parameter estimation of skewed
α-stable distributions. Transaction on Signal Processing, 49(10).
[Kuruoglu(2002)] Kuruoglu, E. E. (2002). Nonlinear least lp -norm filters for nonlinear
autoregressive α-stable processes. Digital Signal Processing, 12, 119–142.
[Kuruoglu(2003)] Kuruoglu, E. E. (2003). Analytical representation for positive α-stable
densities. In Proceedings of ICASSP 2003 ; IEEE International Conference on
Acoustics, Speech, and Signal Processing, volume 6, pages 729–732.
[Lacoume et Ruiz(1988)] Lacoume, J.-L. et Ruiz, P. (1988). Sources identification : a
solution based on cumulants. In Proc. IEEE ASSP Workshop, Minneapolis,
Minnesota.
[Launer et Wilkinson(1979)] Launer, R. L. et Wilkinson, G. N., editors (1979).
Robustness in Statistics. Academic Press, The Army Research Office, Research
Triangle Park, North Carolina, USA. This book contains the proceedings of a
workshop.
[Lecoutre et Tassi(1980)] Lecoutre et Tassi, P. (1980). Statistique non paramétrique et
robustesse. statistica, Paris.
[Lee(1998a)] Lee, T.-W. (1998a). Independent Component Analysis : Theory and
Applications. Kluwer Academic, Boston/ Dordrecht/ London.
[Lee(2001)] Lee, T.-W. (2001). Independent Component Analysis : Theory and
Applications. Kluwer Academic Publishers, Boston.
[Lee et al.(1999)] Lee, T.-W., Lewicki, M. S., et Girolami, M. (1999). Blind source
separation of more sources than mixtures using overcomplete representations. Signal
Processing Letters, 6(4).
[Lee(1998b)] Lee, W. Y. (1998b). Mobile communications engineering. McGraw-Hill,
2nd. edition.
[Leroy(1987)] Leroy, P. J. R. A. M. (1987). Robust Regression & Outlier Detection. John
Wiley & Sons.
[Lévy(1925)] Lévy, P. (1925). Calcul des Probabilités. Gauthier-Villars, Paris.
202
BIBLIOGRAPHIE
[Leyman et al.(2000)] Leyman, A. R., Kamran, Z. M., et Abed-Meraim, K. (2000).
Higher order time frequency based blind source separation technique. IEEE Signal
Processing Letters.
[Linh-Trung(2002)] Linh-Trung, N. (2002). Estimation and separation of LFM Signals
in wirless communication using time-frequency signal processing. Ph.D. thesis,
Queensland University of Technology, Brisbane, Australia.
[Luengo et al.(2003)] Luengo, D., Santamaria, I., Vielva, L., et Pantaleon, C. (2003).
Underdetermined blind separation of sparse sources with instantaneous and
convolutive mixtures. In Proceedings of the IEEE XIII-th Workshop on Neural
Networks for Signal Processing.
[Luigi et Moreau(2002a)] Luigi, C. D. et Moreau, E. (2002a). An iterative algorithm for
the estimation of linear frequency modulated signal parameters. IEEE Signal
Processing Letters, 9(4), 127–129.
[Luigi et Moreau(2002b)] Luigi, C. D. et Moreau, E. (2002b). Wigner-ville and
polynomial wigner-ville transforms in the estimation of nonlinear FM signal
parameters. In Proceedings of IEEE International Conference on Acoustics, Speech
and Signal Processing, volume 2, pages 1433–1436, Orlando, Florida.
[Luo et al.(2004)] Luo, Y., Lambotharan, S., et Chambers, J. (2004). A new block based
time-frequency approach for underdetermined blind source separation. In Proceedings
of ICASSP ’04 ; IEEE International Conference on Acoustics, Speech, and Signal
Processing, volume 5, pages 537–540.
[M. Castella et Pesquet(2004)] M. Castella, E. M. et Pesquet, J.-C. (2004). A quadratic
MISO contrast for blind equalization. In IEEE International Conference on Acoustics,
Speech and Signal Processing, ICASSP’2004, Montréal, Canada.
[M. Sahmoudi et al.(2005)] M. Sahmoudi, K. A.-M., Lavielle, M., Kunth, E., et Ciblat,
P. (2005). Blind source separation using a semi-parametric approach with application
to heavy-tailed signals. In submitted to EUSIPCO’2005.
[Ma et Nikias(1995a)] Ma, X. et Nikias, C. L. (1995a). On blind channel identification
for impulsive signal environments. In Proc. of the Conference ICASSP’1995.
[Ma et Nikias(1995b)] Ma, X. et Nikias, C. L. (1995b). Parameter estimation and blind
channel identification in impulsive signal environments. Transaction on Signal
Processing, 43(12).
[Mandelbrot(1962)] Mandelbrot, B. (1962). Sur certains prix spéculatifs : faits
empiriques et modèle basé sur les processus stables additifs non gaussiens de paul
lévy. Comptes rendus à l’Académie des Sciences, 254, 3968–9370.
[Mandelbrot(1963)] Mandelbrot, B. (1963). The variation of certain speculative prices.
Journal of busniness, 36, 394–419.
[Mansour et Ohnishi(2000)] Mansour, A. et Ohnishi, N. (2000). Discussion of simple
algorithms and methods to separate non-stationary signals. In Fourth IASTED
International Conference On Signal Processing and Communications (SPC 2000),
pages 78–85, Marbella, Spain.
[Mansour et al.(2000a)] Mansour, A., Jutten, C., et Loubaton, P. (2000a). Adaptive
subspace algorithm for blind separation of independent sources in convolutive mixture.
IEEE Trans. on Signal Processing, 48(2), 583–586.
[Mansour et al.(2000b)] Mansour, A., Barros, A. K., et Ohnishi, N. (2000b). Blind
separation of sources : Methods, assumptions and applications. Special Issue on
Digital Signal Processing in IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, E83-A(8), 1498 – 1512.
BIBLIOGRAPHIE
203
[Mansour et al.(2001)] Mansour, A., Puntonet, C. G., et Ohnishi, N. (2001). A simple
ICA algorithm based on geometrical approach. In Sixth International Symposium on
Signal Processing and its Applications (ISSPA 2001), pages 9–12, Kuala-Lampur,
Malaysia.
[Mansour et al.(2002a)] Mansour, A., Ohnishi, N., et Puntonet, C. G. (2002a). Blind
multiuser separation of instantaneous mixture algorithm based on geometrical
concepts. Signal Processing, 82(8), 1155–1175.
[Mansour et al.(2002b)] Mansour, A., Kawamoto, M., et Ohnishi, N. (2002b). A survey
of the performance indexes of ica algorithms. In 21st IASTED International
Conference on Modelling, Identification and Control (MIC 2002), pages 660 – 666,
Innsbruck, Austria.
[Marinovic(1984)] Marinovic, N. (1984). Time-Frequency Analysis. Ph.D. thesis.
[Marple S.L.(2001)] Marple S.L., J. (2001). Large dynamic range time-frequency signal
analysis with application to helicopter doppler radar data. In Sixth International,
Symposium on. Signal Processing and its Applications, volume 1, pages 260–263.
[Martin(1982)] Martin, W. (1982). Time-frequency analysis of random signals. In
Proceeding of ICASSP’1982, pages 1325–1328, Paris, France.
[Martin et Flandrin(1985)] Martin, W. et Flandrin, P. (1985). Wigner–Ville spectral
analysis of non–stationary signals. IEEE Transactions on Acoustics, Speech, and
[Masry et Cambanis(1984)] Masry, E. et Cambanis, S. (1984). Spectral density
estimation for stationary stable processes. Stochastic Processes and their Applications,
18, 1–31.
[Matz et Hlawatsch(1998a)] Matz, G. et Hlawatsch, F. (1998a). Extending the transfer
function calculus of time-varying linear systems : A generalized underspread theory. In
International Conference on Acoustics, Speech, and Signal Processing, ICASSP’98,
pages 2189–2192, Seattle, WA, USA. IEEE.
[Matz et Hlawatsch(1998b)] Matz, G. et Hlawatsch, F. (1998b). Time-frequency transfer
function calculus (symbolic calculus) of linear time-varying systems (linear operators)
based on a generalized underspread theory. Journal of Mathematical Physics, 39(8),
4041–4070.
[Matz et Hlawatsch(1999)] Matz, G. et Hlawatsch, F. (1999). Time-frequency subspace
detectors and application to knock detection. Int. J. Electron. Commun. (AEÜ),
53(6), 379–385.
[Matz et Hlawatsch(2003)] Matz, G. et Hlawatsch, F. (2003). Wigner distribution
(nearly) everywhere : time-frequency analysis of signals, systems, random processes,
signal spaces, and frames. Signal Processing (Elsevier), 83, 1355–1378.
[Matz et al.(1999)] Matz, G., Molisch, A. F., Steinbauer, M., Hlawatsch, F., Gaspard, I.,
et Artés, H. (1999). Bounds on the systematic measurement errors of channel
sounders for time-varying mobile radio channels. In Proc. IEEE VTC-99 Fall, pages
1465–1470, Amsterdam, Netherlands.
[Maymon et al.(2000)] Maymon, S., Friedmann, J., et Messer, H. (2000). A new methode
for estimating parameters of a skewed alpha-stable distribution. In IEEE Conference.
[McCullagh(1987)] McCullagh, P. (1987). Tensor Methods in Statistics. Monographs on
Statistics and Probability, Chapman and Hall.
[McGillem et Cooper(1984)] McGillem, C. et Cooper, G. (1984). Continuous and
Discrete Signal and System Analysis. HRW Series in Electrical and Computer
Engineering. CBS Publishing Japan Ltd., 2nd edition.
204
BIBLIOGRAPHIE
[McGillem et Cooper(1991)] McGillem, C. et Cooper, G. (1991). Continuous and
Discrete Signal and System Analysis. HRW Series in Electrical and Computer
Engineering. Saunders College Publishing, 3rd edition.
[McHale et Boudreaux-Bartels(1993)] McHale, T. J. et Boudreaux-Bartels (1993). An
algorithm for synthesizing signals from partial time-frequency models using the cross
Wigner distribution. IEEE Transactions on Signal Processing, 41(5), 1986–1990.
[Mecklenbräuker et Hlawatsch(1997)] Mecklenbräuker, W. et Hlawatsch, F., editors
(1997). The Wigner Distribution – Theory and Applications in Signal Processing.
Elsevier, Amsterdam, Netherlands.
[Meng et Rubin(1993)] Meng, X. L. et Rubin, D. B. (1993). Maximum likelihood
estimation via the ECM algorithm : a general framework. biometrika, 80(2), 267–278.
[Michael(1983)] Michael, J. R. (1983). The stabilized probability plot. Biometrika, 70,
11–17.
[Middleton(1977)] Middleton, D. (1977). Statistical-physical models of electromagnetic
interference. IEEE Trans. on Electromagnetic Compatibility, EMC-19(3), 106–127.
[Miller(1978)] Miller, G. (1978). Properties of certain symmetric stables distributions.
Journal of Multivariate Analysis, 8(3), 346–360.
[Milstein(1988)] Milstein, L. B. (1988). Interference rejection techniques in spread
spectrum communications. Proceedings of the IEEE, pages 657–671.
[Mirza et Boyer(1993)] Mirza, M. J. et Boyer, K. L. (1993). Performance evaluation of a
class of M-estimators for surface parameter estimation in noisy range data. IEEE
Trans. on Robotics and Automation, 9(1), 75–85.
[Miskin(2000)] Miskin, J. (2000). Ensemble Learning for Independent Component
Analysis. Ph.D. thesis, Cambridge,
http ://www.infernce.phy.cam.ac.uk/jwm1003/.
[Molgedey et Schuster(1994)] Molgedey, L. et Schuster, H. G. (1994). Separation of a
mixture of independent signals using time delayed correlations. Physical Review
Letters, 72, 3634–3636.
[Moreau(2000)] Moreau, E. (2000). Joint-diagonalization of cumulant tensors and source
separation. In Proceedings of the 10th IEEE Signal Processing Workshop on Statistical
Signal and Array Processing (SSAP 2000), pages 339–343, Pocono Manor,
Pennsylvanie, USA.
[Moreau(2001)] Moreau, E. (2001). A generalization of joint-diagonalization criteria for
source separation. IEEE Transactions on Signal Processing, 49(3), 530–541.
[Moreau et Macchi(1996)] Moreau, E. et Macchi, O. (1996). High order contrasts for
self-adaptive source separation. International Journal of adaptive Control and Signal
Processing, 10(1), 19–46.
[Moreau et Pesquet(1997)] Moreau, E. et Pesquet, J.-C. (1997). Generalized contrasts
for multichannel blind deconvolution of linear systems. IEEE Signal Processing
Letters, 4, 182–183.
[Moreau et Stoll(1999)] Moreau, E. et Stoll, B. (1999). An iterative block procedure for
the optimization of constrained contrast function. In Proceedings of the International
Conference on Independent Component Analysis (ICA’99), pages 59–64, Aussois,
France.
[Morelande et Zoubir(2002)] Morelande, M. R. et Zoubir, A. M. (2002). Model selection
of random amplitude polynomial phase signals. IEEE Transactions on Signal
Processing, 50(3), 578–589.
BIBLIOGRAPHIE
205
[Morelande et al.(2000)] Morelande, M. R., Barkat, B., et Zoubir, A. M. (2000).
Statistical performance comparison of a parametric and a non–parametric method for
IF estimation of random amplitude linear FM signals in additive noise. In Proceedings
of the Tenth IEEE Workshop on Statistical Signal and Array Processing, pages
262–266.
[Moussaoui et al.(2004)] Moussaoui, S., Brie, D., Caspary, O., et M.-Djafari, A. (2004).
A bayesian method for positive source separation. In Proceeding of ICASSP ’2004,
volume 5.
[Nandi(1999)] Nandi, A. K., editor (1999). Blind Estimation Using Higher-Order
Statistics. Boston : Kluwer Academic Publishers.
[Nguyen et al.(2001a)] Nguyen, L., Belouchrani, A., Abed-Meraim, K., et Boashash, B.
(2001a). Separating more sources than sensors using time-frequency distributions. In
In Proc. of Int. Symposium on Signal Processing and its Applications (ISSPA’2001),
pages 583–586, Malaysia.
[Nguyen et al.(2001b)] Nguyen, L.-T., Senadji, B., et Boashash, B. (2001b). Scattering
function and time-frequency signal processing. In International Conference on
Acoustics, Speech, and Signal Processing, ICASSP’2001, volume VI, pages 3597–3600,
Salt Lake city, Utah, USA.
[Nikias et Petropulu(1993)] Nikias, C. et Petropulu, A. (1993). Higher–Order Spectra
Analysis : A Nonlinear Signal Processing Framework. Prentice-Hall.
[Nikias et Petropulu(1994)] Nikias, C. L. et Petropulu, A. P. (1994). Higher-order
Spectra Analysis : A Nonlinear Signal Processing Framework. Prentice Hall, New York.
[Nikias et Shao(1995)] Nikias, C. L. et Shao, M. (1995). Signal Processing with
Alpha-Stable Distributions and Applications. John Wiley & Sons, New York.
[Nolan(2004)] Nolan, J. P. (2004). Stable Distributions - Models for Heavy Tailed Data.
Birkhauser, Boston.
[Nowicka(1997)] Nowicka, J. (1997). Asymptotic behavior of the covariation and the
codifference for ARMA models with stable innovations. Communications in Statistics.
Stochastic Models, 13(4), 673–685.
[Ouldali(1999)] Ouldali, A. (1999). Modélisation statistique et identification des signaux
FM à phase polynomiale. Ph.D. thesis, LSS, Supélec–Univ Paris XI, France.
[Ouldali et Benidir(1999)] Ouldali, A. et Benidir, M. (1999). Statistical analysis of
polynomial phase signals affected by multiplicative and additive noise. Signal
Processing, 42(19).
[P. Bickel(1998)] P. Bickel, e. a. (1998). Efficient and Adaptive Estimation for
Semiparametric Models. Springer.
[P.-Y. Arquès(2000)] P.-Y. Arquès, N. T.-M. e. E. M. (2000). Techniques de l’ingénieur,
Traité Mesure et Contrôle, volume RAB, chapter Les représentations temps-fréquences
linéaires et quadratiques en traitement du signal, pages 1–22. Techniques de
l’ingénieur.
[Papoulis(1991)] Papoulis, A. (1991). Probability, Random Variables, and Stochastic
Processes. McGraw-Hill.
[Peleg et Friedlander(1995)] Peleg, S. et Friedlander, B. (1995). The discrete
polynomial-phase transform. IEEE Transactions on Signal Processing, 43(8),
1901–1914.
[Peleg et Friedlander(1996)] Peleg, S. et Friedlander, B. (1996). Multicomponent signal
analysis using the polynomial-phase transform. IEEE Trans. On AES.
[Pesquet et Moreau(2001)] Pesquet, J.-C. et Moreau, E. (2001). Cumulant based
independence measures for linear mixtures. IEEE Transactions on Information
Theory, 47(5), 1947–1956.
206
BIBLIOGRAPHIE
[Pham(1999)] Pham, D. T. (1999). Mutual information approach to blind separation of
stationary sources.
[Pham(2000)] Pham, D. T. (2000). Blind separation of instantaneous mixture of sources
via order statistics. IEEE Transaction on Signal Processing, 48(2), 363–375.
[Pham et Cardoso(2001)] Pham, D. T. et Cardoso, J. F. (2001). Blind separation of
instantanoeus mixtures of nonstationary sources. IEEE Transaction on Signal
Processing, 49(9), 1837–1848.
[Pham et Garrat(1997)] Pham, D.-T. et Garrat, P. (1997). Blind separation of a
mixture of independent sources through a quasi-maximum likelihood approach. IEEE
Transaction on Signal Processing, 45(7), 1712–1725.
[Piasco et al.(1995)] Piasco, J. M., Elkarkour, W., et Guglielmi, M. (1995). Identifiction
paramétriques de différents modèles d’un signal M.L.F. multicomposantes. In
Quinzième colloque GRETSI, pages 193–196, Juan les Pins.
[Poor et Tanda(2002)] Poor, H. et Tanda, M. (2002). Multiuser detection in flat fading
non-gaussian channels. IEEE Transactions on Communications, 50(11), 1769–1777.
[Poor et Wornell(1998)] Poor, H. V. et Wornell, G. W., editors (1998). Wireless
Communications : Signal Processing Perspectives. Prentice-Hall, New Jersey.
[Proakis(1995)] Proakis, J. G. (1995). Digital Communications. McGraw–Hill, 3rd.
edition.
[Rachev(2003)] Rachev, S. T. (2003). Handbook of Heavy Tailed Distributions in
Finance. Elsevier, amsterdam.
[Rai et Singh(2004)] Rai, C. S. et Singh, Y. (2004). Source distribution models for blind
source separation. Neurocomputing, 57, 501–505.
[Rappaport(1996)] Rappaport, T. S. (1996). Wireless Communications : Principles and
Practice. Prentice-Hall, New Jersey.
[Rihaczek(1985)] Rihaczek, A. (1985). Principles of High-Resolution Radar. Peninsula
Publishing.
[Ristic(1995)] Ristic, B. (1995). Some aspects of signal dependent and higher-order
time-frequency and time-scale analysis of non-stationary signals. Ph.D. thesis, Signal
Processing Research Centre, Queensland University of Technology, Brisbane,
Australia.
[Rupi et al.(2004)] Rupi, M., Tsakalides, P., Re, E. D., et Nikias, C. L. (2004). Constant
modulus blind equalization based on fractional lower-order statistics. Signal
Processing, 84, 881–894.
[Sahmoudi(2005)] Sahmoudi, M. (2005). Generalized contrast functions for blind source
separation with unknown number of sources. In IEEE Statistical Signal Processing
Workshop (SSP’2005) (submitted), Bordeaux, France.
[Sahmoudi et Abed-Meraim(2004a)] Sahmoudi, M. et Abed-Meraim, K. (2004,a).
Multicomponent chirp interference estimation for communication systems in impulsive
alpha-stable noise environment. In Proceeding of the IEEE International Symposium
on Control, Communications and Signal Processing (ISCCSP’04), Hammamet,
Tunisia.
[Sahmoudi et Abed-Meraim(2004b)] Sahmoudi, M. et Abed-Meraim, K. (2004b). Robust
blind separation algorithms for heavy-tailed sources. In Proceeding of the IEEE
International Symposium on Signal Processing and Information Theory, Rome, Italy.
[Sahmoudi et al.(2002)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2002). Blind
separation of alpha-stable sources : A new fractional lower-order moments (FLOM)
approach. In Prooceding of the IEEE International Symposium in Signal Processing
and Information Theory (ISSPIT’2002).
BIBLIOGRAPHIE
207
[Sahmoudi et al.(2003a)] Sahmoudi, M., bed Meraim, K., et Benidir, M. (2003a). Blind
separation of instantaneous mixtures of impulsive α-stable sources. In Proceeding of
the IEEE International Symposium on Signal and Image Processing (ISPA’2003).
[Sahmoudi et al.(2003b)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2003b).
Estimation des signaux chirp multi-composantes affectés par un bruit impulsif
α-stable. In Proceeding of GRETSI’2003.
[Sahmoudi et al.(2004a)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2004a).
Blind separation of heavy-tailed signals using normalized statistics. In Proceeding of
ICA’2004, Granada, Spain.
[Sahmoudi et al.(2004b)] Sahmoudi, M., Abed-Meraim, K., et Barkat, B. (2004b). IF
estimation of multicomponent chirp signal in impulsive α-stable noise environment
using parametric and non-parametric approaches. In Proceedings of EUSIPCO’2004,
Austria.
[Sahmoudi et al.(2005)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2005). Blind
separation of impulsive alpha-stable sources using minimum dispersion criterion.
IEEE Signal Processing Letters.
[Samorodnitsky et Taqqu(1994)] Samorodnitsky, G. et Taqqu, M. (1994). Stable
Non-Gaussian Random Processes : Stochastic Models with Infinite Variance.
Chapman & Hall, New York.
[Sarni et al.(2001)] Sarni, Y., Sadoun, R., et Belouchrani, A. (2001). On the application
of chirp modulation in spread spectrum communication systems. In Proccedings of
ISSPA’2001 ; Sixth International, Symposium on Signal Processing and its
Applications, volume 2, pages 501 – 504.
[Sayeed(1998)] Sayeed, A. M. (1998). Canonical time-frequency processing for
broadband signaling over dispersive channels. In Proceedings of the IEEE-SP
International Symposium on Time-Frequency and Time-Scale Analysis, pages
369–372, New York, USA. IEEE.
[Sayeed et al.(1998)] Sayeed, A. M., Sendonaris, A., et Aazhang, B. (1998). Multiuser
detection in fast-fading multipath environments. IEEE Journal on Selected Areas in
Communications, 16(9), 1691–1701.
[Schilder(1970)] Schilder, M. (1970). Some structure theorems for the symmetric stable
laws. Ann. Math. Statist., 41(2), 412–421.
[Senecal(2002)] Senecal, S. (2002). Méthodes de simulation Monte-Carlo par chaı̂nes de
Markov pour l’estimation de modèle. Application en séparation de sources et en
égalisation. Ph.D. thesis, INPG, Grenoble.
[Sengupta et Burman(2003)] Sengupta, K. et Burman, P. (2003). Non-parametric
approach to ICA using Kernel Density Estimation. In Proceedings of IEEE
International Conference on Multimedia and Expo. ICME’03, volume 1, pages
749–752.
[Serfling(1980)] Serfling, R. J. (1980). Approximation Theorems of Mathematical
Statistics. Wiley.
[Shamsunder et al.(1995)] Shamsunder, S., Giannakis, G., et Friedlander, B. (1995).
Estimating random amplitude polynomial phase signals : a cyclostationary approach.
IEEE Trans. n Signal processing, 43(2), 492–505.
[Shannon(1948a)] Shannon, C. E. (1948a). A mathematical theory of communication.
The Bell System Technical Journal, 27(3), 379–423.
[Shannon(1948b)] Shannon, C. E. (1948b). A mathematical theory of communication.
The Bell System Technical Journal, 27, 623–657.
208
BIBLIOGRAPHIE
[Shereshevski(2002)] Shereshevski, Y. (Mars, 2002). Blind signal separation of heavy tail
sources. M.Sc. thesis, Tel Aviv University, Israel.
[Shereshevski et al.(2001)] Shereshevski, Y., Yeredor, A., et Messer, H. (2001).
Super-efficiency in blind signal separation of symmetric heavy-tailed sources. In
Proceedings of 11th IEEE Workshop on Statistical Signal Processing, pages 78 –81.
[Shi et al.(2004)] Shi, Z., Tang, H., Liu, W., et Tang, Y. (2004). Blind source separation
of more sources than mixtures using generalized exponential mixture models.
Neurocomputation, 61, 461–469.
[Shiryayev(1984)] Shiryayev, A. N. (1984). Probability. In Graduate Texts in
Mathematics, volume 95. Springer-Verlag.
[Snoussi(2003)] Snoussi, H. (2003). Approche Bayésienne en Séparation de Sources.
Applications en Imagerie. Ph.D. thesis, Université Paris-Sud Orsay, Paris.
[Snoussi et M.-Djafari(2000)] Snoussi, H. et M.-Djafari, A. (2000). Bayesian source
separation with mixture of gaussians prior for sources and Gaussian prior for mixture
coefficients. In Proc. of MaxEnt ; Bayesian Inference and Maximum Entropy Methods,
pages 388–406, Gif-sur-Yvette, FRANCE.
[Snoussi et M.-Djafari(2004)] Snoussi, H. et M.-Djafari, A. (2004). Fast joint separation
and segmentation of mixed images. Journal of Electronic Imaging, 13(2), 349–361.
[Stankovic(1997)] Stankovic, L. (1997). S–class of time–frequency distributions. IEE
Proc. Vision, Image and Signal Processing, 144(2), 57–64.
[Stankovic et Stankovic(1993)] Stankovic, L. et Stankovic, S. (1993). Wigner
distribution of noisy signals. IEEE Transactions On Signal Processing, 41(2), 956–960.
[Stankovic et Katkovnik(1998)] Stankovic, L. J. et Katkovnik, V. (1998). Algorithm for
the instantaneous frequency estimation using the time–frequency distributions with
adaptive window length. IEEE Signal Processing Letters, 5(9).
[Stoll et Moreau(2000)] Stoll, B. et Moreau, E. (2000). A generalized ICA algorithm.
IEEE Signal Processing Letters, 7(4), 90–92.
[Stone(1990)] Stone, C. J. (1990). Large-sample inference for log-spline models. Ann.
statist., 18(2), 717–741.
[Stuck(1977)] Stuck, B. W. (1977). Minimum error dispersion linear filtering of scalar
symmetric stable processes. IEEE Trans. on Automatic Control, (23), 507–509.
[Stuck et Kleiner(1974)] Stuck, B. W. et Kleiner, B. (1974). A statistical analysis of
telephone noise. Bell System Technical Journal, (53), 1263–1320.
[Subbotin(1923)] Subbotin, M. T. (1923). On the law of frequency of errors.
Mathematicheskii Sbornik, 31, 296–301.
[Sucic et al.(1999)] Sucic, V., Barkat, B., et Boashash, B. (1999). Performance
evaluation of the B distribution. In Proceedings of the Fifth International Symposium
on Signal Processing and its Applications, ISSPA’99, volume 1, pages 267–270,
Brisbane, Queensland, Australia.
[Suppappola(2003)] Suppappola, A. P., editor (2003). Applications in Time-Frequency
Signal Processing. CRC PRESS.
[Swami et Sadler(1998)] Swami, A. et Sadler, B. (1998). Parameter estimation for linear
alpha-stable processes. IEEE Signal Processing Letters, 5(2).
[Swarts et al.(1999)] Swarts, F., van Rooyan, P., Oppermann, I., et Lotter, M. P.,
editors (1999). CDMA Techniques for Third Generation Mobile Systems. Kluwer
Academic Publishers, Boston.
[Takada(2001)] Takada, T. (2001). Nonparametric density estimation : A comparative
study. Economics Bulletin, 3(16), 1–10.
BIBLIOGRAPHIE
209
[Taleb(1999)] Taleb, A. (1999). Séparation de Sources dans des Mélanges Non Linéaires.
Ph.D. thesis, INPG, Grenoble, France.
[Thirion-Moreau et al.(2004)] Thirion-Moreau, N., Fadili, E., et Moreau, E. (2004). A
sufficient condition for separation of deterministic signals based on spatial
time-frequency representation. In Proceedings of the International Conference on
Independent Component Analysis (ICA’2004), pages 366–373.
[Tong et al.(1991)] Tong, L., Liu, R.-W., Soon, V., et Huang, Y.-F. (1991).
Indeterminacy and identifiability of blind identification. IEEE Trans. on Circuits and
Systems, 38, 499–509.
[Tourneret(1998)] Tourneret, J. (1998). Detection and estimation of abrupt changes
contaminated by multiplicative Gaussian noise. Signal Processing, 68, 259–270.
[Tourneret et al.(2003a)] Tourneret, J. Y., Doisy, M., et Lavielle, M. (2003a). Bayesian
retrospective detection of multiple change-points corrupted by multiplicative noise
application to SAR image edge detection. Signal Processing, 83, 1871–1887.
[Tourneret et al.(2003b)] Tourneret, J.-Y., Suparman, S., et Doisy, M. (2003b).
HierarchicalBayesian segmentation of signals corrupted by multiplicative noise. In
Proceeding of ICASSP’2003, pages 165–168, Hong-Kong, China.
[Tsakalides et Nikias(1996)] Tsakalides, P. et Nikias, C. (1996). The robust
covariation-based MUSIC (ROC-MUSIC) algorithm for bearing estimation in
impulsive noise environmentts. Trans. on Signal Processing, 44(7), 1623–1633.
[Tsihrintzis et Nikias(1996)] Tsihrintzis, G. et Nikias, C. (1996). Fast estimation of the
parameters of alpha-stable impulsive interference. 44(6).
[VanTrees(1968)] VanTrees, H. L. (1968). Detection, Estimation, and Modulation
Theory : Part I. John Wiley & Sons.
[VanTrees(1992)] VanTrees, H. L. (1992). Detection, Estimation, and Modulation
Theory. Radar-Sonar Signal Processing and Gaussian Signals in Noise. Krieger Pub.
Co., Malabar, Florida.
[Ville(1948)] Ville, J. (1948). Théorie et applications de la notion de signal analytique.
Cables et Transmissions, 2A(1), 61–74.
[Vincent(1995)] Vincent, I. (1995). Classification de Signaux non Stationnaires. Ph.D.
thesis, Université de Nantes/Ecole Centrale de Nantes.
[Walter(1994)] Walter, C. (1994). Les structures du hasard en économie : efficience des
marchés, lois stables et processus fractales. Ph.D. thesis, IEP Paris.
[Wang et al.(2002)] Wang, Y., Gao, L., Zhao, M., Chen, J., Zhang, Z., et Yao, Y. (2002).
Time-frequency code for multicarrier DS-CDMA systems. In Proceeding of the IEEE
55th Vehicular Technology Conference, volume 3, pages 1224–1227.
[Wegman et al.(1989)] Wegman, E. J., Schwartz, S. G., et Thomas, J. (1989). Topics in
non-Gaussian Signal Processing. Academic Press, New York.
[White et Boashash(1988)] White, L. V. et Boashash, B. (1988). On estimating the
instantaneous frequency of a Gaussian random signal by use of the Wigner–Ville
distribution. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(3),
417–420.
[Wood et Barry(1994)] Wood, J. C. et Barry, D. T. (1994). Linear signal synthesis using
the Radon-Wigner transform. IEEE Transactions on Signal Processing, 42(8),
2105–2111.
[Xueshi Yang et Pesquet(2001)] Xueshi Yang, A. P. P. et Pesquet, J. C. (2001).
Estimating long-range dependence in impulsive traffic flows. In IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP ’01), pages 3413 –
3416.
210
BIBLIOGRAPHIE
[Zhang et Amin(2000)] Zhang, Y. et Amin, M. G. (2000). Blind separation of sources
based on their time-frequency signatures. In Proceedings. 2000 IEEE International
Conference on Acoustics, Speech, and Signal Processing, ICASSP ’00, volume 5,
Istanbul, Turkey.
[Zhang et Kassam(2004)] Zhang, Y. et Kassam, S. A. (2004). Robust rank-EASI
algorithm for blind source separation. IEE Proc. Commun., 151(1), 15–19.
[Zhang et al.(2001)] Zhang, Y., Ma, W., et Amin, M. G. (2001). Subspace analysis of
spatial time–frequency distribution matrices. IEEE Transactions on Signal
Processing, 49(4), 747–759.
[Zhao et al.(1990)] Zhao, Y., Atlas, L. E., et Marks, R. J. (1990). The use of
cone-shaped kernels for generalized time-frequency representations of nonstationary
signals. IEEE Trans. on Acoustics, Speech, and Signal Processing, 38(7), 1084–1091.
[Zhong et al.(2004a)] Zhong, M., tang, H., et Tang, Y. (2004a).
Expectation-Maximization approaches to independent component analysis.
Neurocomputing, 61, 503–512.
[Zhong et al.(2004b)] Zhong, M.-J., Tang, H.-W., Chen, H.-J., et Tang, Y.-Y. (2004b).
An EM algorithm for learning sparse and overcomplete representations.
Neurocomputation, 57, 469–476.
[Zhou et Giannakis(1994a)] Zhou, G. et Giannakis, G. (1994a). Self coupled harmonics :
stationary and cyclostationary approaches. In International Conference on Acoustics,
Speech, and Signal Processing, ICASSP’94, volume 4, pages IV/153–156, Adelaide,
SA, Australia. IEEE.
[Zhou et Giannakis(1995)] Zhou, G. et Giannakis, G. (1995). Harmonics in Gaussian
multiplicative and additive noise : Cramer-Rao bounds. IEEE Trans. on Signal Proc.,
43(5), 1217–1231.
[Zhou et Giannakis(1996)] Zhou, G. et Giannakis, G. (1996). Polyspectral analysis of
mixed processes and coupled harmonics. IEEE Transactions on Information Theory,
42(3), 943–958.
[Zhou et Giannakis(1993)] Zhou, G. et Giannakis, G. B. (1993). Comparison of
higher-order and cyclic approaches for estimating random amplitude modulated
harmonics. In IEEE Signal Processing Workshop on Higher-Order Statistics, pages
225–229, South Lake Tahoe, CA, USA.
[Zhou et Giannakis(1994b)] Zhou, G. et Giannakis, G. B. (1994b). On estimating
random amplitude-modulated harmonics using higher order spectra. IEEE Journal of
Oceanic Engineering, 19(4), 529–539.
[Zhou et al.(1996)] Zhou, G., Giannakis, G., et Swami, A. (1996). On polynomial phase
signals with time-varying amplitudes. IEEE Trans. on Signal Proc., 44(4), 848–861.
[Ziehe et Müller(1998)] Ziehe, A. et Müller, K.-R. (1998). TDSEP—an efficient
algorithm for blind separation using time structure. In Proc. Int. Conf. on Artificial
Neural Networks (ICANN’98), pages 675–680, Skövde, Sweden.
[Zolotarev(1966)] Zolotarev, V. (1966). On representation of stable laws by integrals. In
Selected Translations in Mathematical Statistics and Probability, volume 6, pages 84–8.
American Mathematical Society.
[Zolotarev(1986)] Zolotarev, V. M. (1986). One-dimentional stable distribution. In
Translations of Mathematical Monographs, volume 65. American Mathematical
Society.
[Zoubir et Brcich(2002)] Zoubir, A. et Brcich, R. (2002). Multiuser detection in
non-gaussian channels. Digital Signal Processing, 12, 262–273.
[Zoubir et Arnold(1996)] Zoubir, A. M. et Arnold, M. J. (1996). Testing Gaussianity
with the characteristic function : the i.i.d. case. Signal Processing, 53(2), 110–120.

my PhD Thesis

Transcription

Documents pareils

Fiche technique HOR25CLA Horloge murale classique

Préparer sa classe

LA CARROSSERIE, R?PARATION, RESTAURATION, PEINTURE, T

HEBERGEMENT WEB LUXEMBOURG

Fiche technique CAD_ Vitrines extra-plates d`intérieur Réf

Maison sur Chambon feugerolles Réf : 391

Fiche 1_Vignette MOLTENI

Etiqueteuse P-touch 3600

Inscription au Master II Parcours Préparation `a l`Agrégation de

Descriptif : Dimensions