my PhD Thesis
Transcription
my PhD Thesis
ORSAY d’ordre : 7774 UNIVERSITÉ PARIS–SUD THÈSE présentée pour obtenir LE GRADE DE DOCTEUR EN SCIENCES DE L’UNIVERSITÉ PARIS XI ORSAY Spécialité : Automatique et Traitement du Signal par Mohamed SAHMOUDI PROCESSUS ALPHA-STABLES POUR LA SÉPARATION ET L’ESTIMATION ROBUSTES DES SIGNAUX NON-GAUSSIENS ET/OU NON-STATIONNAIRES Soutenue le 13 décembre 2004 devant le jury composé de : Rapporteurs Eric Moreau Jean-Yves Tourneret Examinateurs Jean-Pierre Delmas Ali M.-Djafari Jean Christophe Pesquet Encadrant Karim Abed-Meraim Directeur Messaoud Benidir Professeur, Université de Toulon Professeur, INP, Toulouse Professeur, INT, Evry Directeur de Recherche, LSS, CNRS Professeur, Université Marne la Vallée Enseignant-Chercheur, Telecom Paris Professeur, Université Paris XI Orsay 2 EN T ON N OM ET POUR T OI MON DIEU , L0 OMN ISCIEN T ET L0 OMN IPOT EN T SBD sur toi Ahmad, mon modèle idéal ... À ma famille, À toi ”Knikina”... ♥ REMERCIEMENTS J’aime autant le chemin parcouru que l’arrivée au but ; je considère ce manuscrit comme mémoire de thèse de doctorat, mais aussi comme une belle histoire à raconter, une histoire d’idées et de personnes que j’espère vous faire un peu connaı̂tre. Je suis profondément reconnaissant aux chercheurs qui ont partagé avec moi, non seulement les résultats de leurs travaux, mais aussi leur contexte humain. Faire justice à tout ceux qui ont contribué à l’élaboration de ma thèse est particulièrement délicat. Je n’ai pas pu dans ce cadre limité, mentionner tous les chercheurs qui font partie de cette histoire et qui méritent d’y trouver leur nom. J’espère qu’ils me le pardonneront. Je n’aurais jamais pu accomplir ce travail sans aide. Et par ordre chronologique : Je tiens d’abord à remercier toute ma famille qui a supporté toutes les difficultés morales et matérielles pour me soutenir tout au long de mes études supérieures. Mon père Mohammadine, ma mère Habiba, ma grande soeur Fatima et son petit Fouad, ma petite soeur Saida et mon petit frère Hafid, je les garde bien au chaud, dans mon coeur. Ils savent combien ils comptent pour moi... Je suis aussi particulièrement redevable à mon oncle Elhoucine Sahmoudi. Sa confiance et son soutien m’ont beaucoup aidé à démarrer mes études de troisième cycle. Je remercie M. Paul Deheuvels, directeur du laboratoire de statistiques LSTA de l’université Pierre et Marie Curie - Paris 6, et tous les enseignants du DEA de statistique de Paris VI pour leur qualité d’enseignement et leur encadrement. Je suis aussi redevable à Hervé Monod, chercheur au laboratoire de Biométrie de l’INRA de Jouy-en-Josas, pour m’avoir encadrer pendant mon stage de DEA et permis de réaliser de belles applications de la théorie dans son domaine d’Agronomie. Également, je remercie le professeur Y. Kutoyants de l’université de Le Mans pour son encadrement lors de mon mémoire théorique du DEA sous sa direction. i Je tiens à saluer toute ”la bande” d’Antony, mes amis de la résidence universitaire Jean-Zay, qui ont constitué ma deuxième famille pendant 4 ans. En particulier, Abdelillah Sahmoudi, Zirari et sa petite famille, Ajinou, Sabri, Elhattab, Rekik, Eljazouli, Halmi, Sbai, Brich, Bajdouri, ... Je remercie également mon directeur de thèse Prof. Messaoud Benidir, Professeur à l’université Paris-Sud d’Orsay, pour m’avoir donné l’occasion d’entrer dans le monde fascinant de traitement du signal. Sa confiance et son soutien m’ont beaucoup aidé à accomplir ce travail. Il a su guider ce travail avec sagesse, tout en me laissant une grande liberté. Je tiens à témoigner publiquement à Dr. Karim Abed-Meraim, mon directeur scientifique de recherche, toute la reconnaissance que je lui dois. Ce dernier a suscité, developpé, puis accompagné mes premiers pas dans le domaine du traitement du signal avec une grande patience et avec une pédagogie extraordinaire. Le soutien moral, matériel et intellectuel de mon encadrant Dr. K. Abed-Meraim, fut essentiel. Non seulement il m’a fourni une aide indispensable à l’avancement de mes travaux de recherches, mais aussi, quand j’étais dans certaine situation familiale ou financière délicate, il a su, avec un instinct infaillible, localiser mon laxisme et m’a aidé avec ses sages conseils prodigués à reprendre le chemin. Grand professeur et source inépuisable de nouvelles idées, il restera mon mentor,.... Mes remerciements s’adressent aussi à M. Henri Maitre, chef du département TSI de l’ENST, de m’avoir accepté au sein de son département. Aux membres du TREX Electronique de l’Ecole Polytechnique qui m’ont accueilli comme moniteur, plus particulièrement Yvan Bonnassieux et Stéphane Mallat avec qui j’ai partagé le grand plaisir d’enseigner à l’X. Aux membre du CEREMADE de l’université Dauphine qui m’ont accueillie comme ATER, plus particulièrement M. Bellec, C. Pardoux, C. Robert avec qui j’ai partagé mon goût pour l’enseignement des statistiques et des mathématiques. Je remercie sincèrement les professeurs Eric Moreau et Jean-Yves Tourneret d’avoir accepté la lourde tâche de rapporteurs malgré le temps très restreint que je leur ai laissé. Par leurs questions et remarques constructives, ils m’ont été d’une aide précieuse et m’ont permis d’améliorer de manière significative certaines parties de mon manuscrit. J’aimerais également remercier les professeurs Jean-Pierre Delmas, Ali MohammadDjafari et Jean Christophe Pesquet pour avoir accepté de juger ces quelques années de travail en participant au jury de ma thèse et pour l’intérêt qu’ils ont bien voulu porter à mon travail. ii D’autres chercheurs ont répondu à ma demande de coopération avec une générosité extraordinaire. B. Boashash (Australie), B. Barkat (Singapore), A. Belouchrani (Algérie) et M. Taqqu (Boston, USA) pendant leurs séjours sabbatiques au département TSI de l’ENST. L.J. Stanković (Montenegro), A. Hero (USA), J.-F. Cardoso (France) et J. Chambers (UK) dans différentes occasions. Ils ont partagé avec moi leur érudition. J’admire et j’apprécie non seulement leur compétence professionnelle, mais aussi l’ingéniosité qu’ils ont mise en œuvre pour m’expliquer certains concepts techniques et la courtoisie avec laquelle ils ont épargné mon amour-propre de la recherche scientifique. Je remmercie également Philippe Ciblat de Comelec-ENST, Marc Lavielle et Estelle Kuhn de l’équipe modélisation statistique de Paris-Sud pour les nombreuses discussions scientifiques que nous avons eu lors de nos réunions dans le cadre du projet MathSTIC. Aux membres du LSS de Supélec et TSI de Telecom Paris et particulièrement à Naji, Gazzah, Snoussi, Sayadi, Belkacemi, Djeddi, Khanfouci, Mohammadpour, Hallouli, Thomas, Mouhouche, Djalil, Trung, Berriche, Souidene et Robert, j’exprime ma plus grande gratitude. L’ambiance cosmopolite qui caractérise les deux laboratoires m’a donné envie de suivre la recherche sans frontières... Le plaisir que j’ai eu à écrire ce rapport est largement dû à la bonté de plusieurs personnes et je ne saurais assez les remercier... Je garde le meilleur pour la fin, ma douce et tendre Nassera. Tu m’a redonné confiance au moment où j’en avais le plus besoin, tu m’a permis de continuer ce travail sans jamais abondonner, tu as supporté avec une grande sagesse et une grande patience que je travail le week-end et que je rentre tard le soir. Pour tous cela et bien plus encore, je ne te remercierai jamais assez,... un grand merci à toi ”Knikina” !. L’auteur Mohamed Sahmoudi iii iv Table des matières Dédicaces i Remerciements i Table des Matières iv Liste des Figures x Liste des Tableaux xiii Résumé xiv Publications de l’Auteur xvi Notations et Abréviations xviii 1 Introduction 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Non-gaussianité . . . . . . . . . . . . . . . . . . . . . 1.1.2 Non-stationnarité . . . . . . . . . . . . . . . . . . . . 1.1.3 Robustesse . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Position du Problème . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Séparation de sources impulsives à variance infinie . . 1.2.2 Estimation de signaux FM multicomposantes dans un environnement impulsif . . . . . . . . . . . . . . . . . 1.3 Objectifs et Contributions . . . . . . . . . . . . . . . . . . . 1.4 Organisation du Document . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 7 8 8 . 9 . 11 . 13 I Outils pour le Traitement des Signaux non-Gaussiens et/ou non-Stationnaires 17 2 Distributions Non-Gaussiennes à Queues Lourdes 19 2.1 Bref Historique . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Lois Stables Univariées . . . . . . . . . . . . . . . . . . . . . . 22 v 2.3 2.4 2.5 2.6 2.7 2.8 2.2.1 Lois indéfiniment divisibles . . . . . . . . . . . . . . . . 2.2.2 Deux définitions équivalentes des distributions α-stables 2.2.3 Stabilité de quelques lois usuelles . . . . . . . . . . . . 2.2.4 Propriétés des lois stables . . . . . . . . . . . . . . . . 2.2.5 Moments fractionnaires d’ordre inférieur . . . . . . . . 2.2.6 Simulation des lois stables . . . . . . . . . . . . . . . . Inférence Statistique des Lois Stables . . . . . . . . . . . . . . 2.3.1 Tests de la variance . . . . . . . . . . . . . . . . . . . . 2.3.2 Estimation des paramètres des lois α-stables . . . . . . Lois Stables Multivariées . . . . . . . . . . . . . . . . . . . . . 2.4.1 Définition et propriétés . . . . . . . . . . . . . . . . . . 2.4.2 Moments des lois stables multivariées . . . . . . . . . . 2.4.3 Vecteur aléatoire α-sous-gaussien . . . . . . . . . . . . Mesure de Dépendance des v.a.r. α-Stables . . . . . . . . . . . 2.5.1 Covariation . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Métrique de covariation . . . . . . . . . . . . . . . . . 2.5.3 Coefficient de covariation . . . . . . . . . . . . . . . . . 2.5.4 Codifférence . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Coefficient de covariation symétrique . . . . . . . . . . 2.5.6 Estimation des coefficients de covariation . . . . . . . . Représentation Analytique des PDF α-Stables . . . . . . . . . 2.6.1 Développement en séries entières . . . . . . . . . . . . 2.6.2 Développement asymptotique . . . . . . . . . . . . . . 2.6.3 Approximation par un mélange fini . . . . . . . . . . . Autres Distributions à Queues Lourdes . . . . . . . . . . . . . 2.7.1 Loi gaussienne généralisée . . . . . . . . . . . . . . . . 2.7.2 Loi normale inverse gaussienne . . . . . . . . . . . . . 2.7.3 Loi t-Student . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Robust Estimation 3.1 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 M- Estimation . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Minimax M-estimate of location estimator . . . . 3.2.2 Influence Function . . . . . . . . . . . . . . . . . 3.2.3 M-Estimation of a deterministic signal parameter 3.2.4 Theoretical performance . . . . . . . . . . . . . . 3.2.5 Minimax optimal cost function . . . . . . . . . . 3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . 22 23 25 27 30 31 32 32 33 35 35 37 38 39 39 40 41 43 44 44 44 44 45 45 46 46 49 51 52 . . . . . . . . 53 53 54 56 57 58 59 59 60 4 Time–Frequency Concepts 4.1 Need of Time-Frequency Representation . . . . . . . . . . . . 4.2 Nonstationarity and FM Signals . . . . . . . . . . . . . . . . . 4.3 The STFT, SPEC, WVD, and Quadratic TFD . . . . . . . . . 61 61 63 66 vi . . . . . . . . . . . . . . . . 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Reduced Interference Distributions . . . . . The WVD and Ambiguity Function . . . . . Relationships Among Dual Domains . . . . Time–Frequency Signal Synthesis . . . . . . IF Estimation . . . . . . . . . . . . . . . . . Engineering Applications of Time–Frequency Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 68 69 70 70 71 72 II Blind Separation of Impulsive Sources with Infinite Variances 73 5 State of the Art of BSS 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 What is blind source separation (BSS) ? . . . 5.1.2 Brief history of BSS . . . . . . . . . . . . . . 5.1.3 Statistical information for BSS . . . . . . . . . 5.2 Linear Instantaneous Mixtures . . . . . . . . . . . . . 5.2.1 Separability and indeterminacies . . . . . . . . 5.2.2 How to find the independent components . . . 5.3 Basic BSS Methods . . . . . . . . . . . . . . . . . . . 5.3.1 BSS by minimization of mutual information . 5.3.2 BSS by maximization of non-gaussianity . . . 5.3.3 BSS by maximum likelihood estimation . . . . 5.3.4 BSS by algebraic tensorial methods . . . . . . 5.3.5 BSS by non-linear decorrelation . . . . . . . . 5.3.6 BSS using geometrical concepts . . . . . . . . 5.3.7 Source separation using Bayesian framework . 5.3.8 BSS using time structure . . . . . . . . . . . . 5.4 BSS of Impulsive Heavy-Tailed Sources . . . . . . . . 5.4.1 Why heavy-tailed α-stable distributions ? . . . 5.4.2 Existing BSS methods for heavy-tailed signals 5.5 Conclusion & Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 76 76 76 79 81 81 83 85 85 87 87 88 88 89 89 90 92 92 92 93 . . . . . . . . . 95 95 95 96 97 97 98 101 102 102 . . . . . . . . . . . . . . . . . . . . 6 Minimum Dispersion Approach 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The failure of second and higher -order methods 6.1.2 Fractional lower-order statistics (FLOS) theory 6.2 Source Separation Procedure . . . . . . . . . . . . . . . 6.2.1 Whitening by normalized covariance matrix . . 6.2.2 Minimum dispersion criterion . . . . . . . . . . 6.2.3 Separation algorithm : Jacobi implementation . 6.3 Performance Evaluation & Comparison . . . . . . . . . 6.3.1 Generalized rejection level index . . . . . . . . . vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 6.3.2 Experimental results . . . . . . . . . . . . . . . . . . . 102 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 107 7 Sub- and Super- Additivity based Contrast Functions 7.1 BSS Using Contrast Functions . . . . . . . . . . . . . . . 7.2 On contrast functions . . . . . . . . . . . . . . . . . . . . 7.3 Orthogonality constraint . . . . . . . . . . . . . . . . . . 7.4 Sub-Additivity based Contrast Functions . . . . . . . . . 7.4.1 Lp -norm contrast functions ; p ≥ 1 . . . . . . . . 7.4.2 Alpha-stable scale contrast function . . . . . . . 7.5 Super-Additivity based Contrast Functions . . . . . . . . 7.5.1 Dispersion contrast function . . . . . . . . . . . . 7.6 Jacobi-Gradient Algorithm for Prewhitened BSS . . . . . 7.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 . 109 . 109 . 111 . 111 . 113 . 113 . 114 . 115 . 116 . 117 8 Normalized HOS-based Approaches 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Normalized Statistics of Heavy-Tailed Mixtures . . . . 8.2.1 Normalized moments . . . . . . . . . . . . . . . 8.2.2 Normalized second and fourth order cumulants . 8.3 Normalized Tensorial BSS Methods . . . . . . . . . . . 8.3.1 Separation algorithms . . . . . . . . . . . . . . 8.3.2 Performance evaluation & comparison . . . . . 8.4 Normalized Non-linear Decorrelation BSS Methods . . 8.4.1 Robust composite criterion for source separation 8.4.2 Iterative quasi-Newton implementation . . . . . 8.4.3 Performance evaluation & comparison . . . . . 8.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 119 120 120 121 121 121 123 125 126 127 128 131 9 A Semi-Parametric Maximum Likelihood Approach 9.1 The Likelihood of the BSS Model . . . . . . . . . . . . 9.1.1 Derivation of the likelihood . . . . . . . . . . . 9.1.2 Sources density estimation . . . . . . . . . . . . 9.1.3 Optimization via the EM algorithm . . . . . . . 9.2 Semi-Parametric Source Separation . . . . . . . . . . . 9.2.1 Noisy linear instantaneous mixtures. . . . . . . 9.2.2 The proposed approach. . . . . . . . . . . . . . 9.2.3 Density estimation by B-spline approximations . 9.2.4 The SAEM algorithm . . . . . . . . . . . . . . . 9.3 Performance evaluation & comparison . . . . . . . . . . 9.3.1 Some existing BSS methods . . . . . . . . . . . 9.3.2 Parametric versus semi-parametric approaches . 9.3.3 Computer simulation experiments . . . . . . . . 9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 133 133 134 134 135 135 136 137 138 139 139 140 140 143 viii III Separation and Estimation of Multicomponent FM Signals affected by Heavy-Tailed Noise 145 10 State of the Art 10.1 Modern Spectral Analysis Approaches . . . . . . . 10.2 Time-Frequency Analysis Approaches . . . . . . . . 10.2.1 IF estimation using time-frequency methods 10.2.2 Analysis of noisy multicomponent signals . . 10.3 Robust time-frequency analysis . . . . . . . . . . . 10.4 Concluding Remarks . . . . . . . . . . . . . . . . . 11 Robust Parametric Approaches 11.1 Introduction-Problem Statement . . . . . . . 11.2 Polynomial-Phase Transform of FM Signals 11.3 IF Estimation Procedure of FM Signals . . . 11.4 Robust Subspace Estimation . . . . . . . . . 11.4.1 TRUNC-MUSIC algorithm . . . . . . 11.4.2 FLOS-MUSIC algorithm . . . . . . . 11.4.3 ROCOV-MUSIC algorithm . . . . . . 11.5 Performance Evaluation & Comparison . . . 11.5.1 Mixture of sinusoidal component . . 11.5.2 Mixture of two chirps . . . . . . . . . 11.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Robust Time-Frequency Approaches 12.1 Introduction-Problem Statement . . . . . . . . . . . . . . . 12.2 Failure of Standard TFD in Impulsive Noise . . . . . . . . 12.2.1 Effect of impulsive spike noise on TFD . . . . . . . 12.2.2 Effect of impulsive α-stable noise on TFD . . . . . 12.2.3 The need of robust TFD in Gaussian environment . 12.3 Pre-processing Techniques based Approach . . . . . . . . . 12.3.1 Exponential compressor filter . . . . . . . . . . . . 12.3.2 Huber filter . . . . . . . . . . . . . . . . . . . . . . 12.4 Robust Time-Frequency Approach . . . . . . . . . . . . . . 12.4.1 Optimal TFD kernel in α-stable noise . . . . . . . . 12.4.2 A new robust quadratic time-frequency distribution 12.5 IF Estimation & Component Separation . . . . . . . . . . 12.6 Performance Evaluation & Comparison . . . . . . . . . . . 12.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . 147 147 148 150 150 151 152 . . . . . . . . . . . . . . . . . . . . . . . 153 . 153 . 155 . 155 . 156 . 156 . 157 . 158 . 159 . 159 . 161 . 162 . . . . . . . . . . . . . . 163 . 163 . 165 . 165 . 166 . 167 . 167 . 167 . 169 . 169 . 169 . 170 . 172 . 173 . 178 13 Conclusions et Perspectives 179 13.1 Conclusion Générale . . . . . . . . . . . . . . . . . . . . . . . 179 13.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Références Bibliographiques 185 ix x Table des figures 1.1 1.2 2.1 2.2 2.3 2.4 4.1 4.2 4.3 4.4 Réalisations d’un signal Gaussien et celles d’un signal α-stable. • Figures (c) et (d) : Lorsque la taille de l’échantillon est relativement petite, les deux réalisations de la loi gaussienne et de la loi α-stable sont semblables. • Figures (a) et (b) : Lorsque la taille de l’échantillon est relativement large, les deux réalisations se diffèrent clairement . 5 Exemples de signaux non-stationnaires. (a–c) Représentent des signaux d’applications de la vie réelle par la B-distribution : (a) pour un signal de Baleines, (b) pour un signal electroencephalogram et (c) pour un signal de Chauve-souris. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Réalisations de signaux α-stables pour différentes valeurs de α. Densité de probabilité α-stables pour différentes valeurs de α. Les queues de la densité de probabilité α-stable pour différentes valeurs de α. . . . . . . . . . . . . . . . . . . . . . . . . . . . La densité de probabilité de la loi N IG(α, 0, 1, 0) pour différent valeurs de α. . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) : Time-domain and (b) : frequency-domain representations of an LFM signal. It shows clearly the inherent limitation of classical representations of a non-stationary signal. . . . . . A TF representation of the LFM signal in 4.1. . . . . . . . . Examples of nonstationary signals. An engineering application is shown in (a) for a linear FM signal (plotted using the Wigner–Ville distribution). Real–life applications are shown in (b–d) for a whale signal, an electroencephalogram signal, and a bat signal, respectively (all plotted using the B distribution). . . . . . . . . . . . . . . . Quadratic representations corresponding to the WVD. Wz (t, f ), Az (τ, ν), Kz (t, τ ) and Dz (ν, f ) are respectively the WVD, AF, time–lag signal kernel and the Doppler–frequency signal kernel of the analytic signal z(t). . . . . . . . . . . . . xi 25 28 30 50 . 62 . 63 . 65 . 69 4.5 Dual domains of general signal quadratic representations. γ(t, f ), Γ (τ, ν), G(t, τ ) and G(ν, f ) are the TFD time–frequency, Doppler–lag, time–lag and Doppler–frequency kernel, respectively. ρz (t, f ) and Az (τ, ν) are the general quadratic TFD and the GAF of the analytic signal z(t). . . . . . . . . . . . . . . . 69 5.1 5.2 Signal model for the blind source separation problem . . . . . 76 Order of Statistics in Blind Source Separation . . . . . . . . . 79 6.1 Extraction of 3 α-stable sources from 3 observations where α = 0.5, and N = 10000. . . . . . . . . . . . . . . . . . . . . . 103 Generalized mean rejection level versus α where N = 1000. . . 103 Generalized mean rejection level versus the estimation error ∆α.104 Generalized mean rejection level versus sample size N. . . . . . 105 Generalized mean rejection level versus sample size for α = 1.5.106 Generalized mean rejection level versus the additive noise power for α = 1.5. . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 6.3 6.4 6.5 6.6 8.1 8.2 8.3 8.4 Generalized mean rejection level versus the noise power. . Generalized mean rejection level versus the sample size. . Generalized mean rejection level versus the sample size. . . Mean rejection level versus the noise power with T = 1000. 9.1 9.2 9.3 Consistency of different BSS algorithms. The sample sizes were 1000 for case (1) and 5000 for case (2). . . . . . . . . . . . . . 141 The performance index versus noise level. . . . . . . . . . . . . 142 The performance index versus sample size. . . . . . . . . . . . 142 11.1 11.2 11.3 11.4 The The The The MSE MSE MSE MSE versus versus versus versus the the the the noise dispersion in dB, N=1000. sample size, γ = 0.1. . . . . . . sample size, γ = 0.1. . . . . . . noise dispersion in dB, N=1000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 The nonlinear law of the compressor used in the pre-processing stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Compression of a linear FM signal in impulsive noise using different values of β. . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 The standard MBD of the multi-component signal test. . . . . 12.4 The Robust-MBD of the multi-component signal test. . . . . . 12.5 The NMSE versus sample size : a comparative study. . . . . . 12.6 NMSE of IF estimates, corresponding to the HAF, r-PWVD and the R-MBD for a noisy two-component chirp signal. . . . 12.7 Normalized MSE of the various phase parameters versus sample size, γ = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Normalized MSE of the various phase parameters versus noise dispersion in dB, N=1000. . . . . . . . . . . . . . . . . . . . . xii 124 124 129 131 160 160 161 162 168 168 174 174 175 176 177 178 Liste des tableaux 2.1 2.2 2.3 2.4 2.5 La moyenne et la variance des lois α-stables pour différentes valeurs de α . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test graphique de la variance en utilisant la variance empirique. Test graphique de la queue d’une distribution par la méthode dite ”log-log”. . . . . . . . . . . . . . . . . . . . . . . . . . . . Valeurs optimales de K en fonction de n et de α . . . . . . . . Approximation de la PDF SαS par le modèle de mélange de gaussiennes et affinage de l’approximation par l’algorithme EM. 32 33 33 35 47 4.1 Some common TFD and their kernels. . . . . . . . . . . . . . 68 6.1 The principal steps of the proposed minimum dispersion (MD) algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.1 8.2 The principal steps of the proposed Robust-JADE algorithm. . 122 The principal steps of the proposed Robust-EASI algorithm. . 128 11.1 The proposed frequency estimation TRUNC-MUSIC algorithm.156 11.2 The proposed robust frequency estimation FLOS-MUSIC algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 11.3 The proposed robust covariance estimation ROCOV algorithm. 159 11.4 The proposed frequency estimation ROCOV-MUSIC algorithm.159 12.1 Computation procedure of the Robust-MBD. . . . . . . . . . . 171 12.2 Component separation procedure for the proposed algorithm . 173 xiii xiv RÉSUMÉ L’objectif principal de ce travail de thèse est de développer de nouvelles techniques robustes pour le traitement des signaux non-gaussiens et/ou nonstationnaires dans des environnements impulsifs. Plus précisément, le travail de cette thèse de doctorat se situe au carrefour des deux problématiques suivantes : I- Séparation aveugle de mélanges linéaires de sources impulsives : Ce problème a été peu étudié pour certains cas statistiquement ardus. En effet, lorsque les sources sont modélisées par des lois α-stables, les méthodes classiques ne s’appliquent plus, car la densité de probabilité n’a pas d’expression analytique explicite et les moments d’ordre 2 ou d’ordre supérieur à 2 sont infinis. Dans ce cas, nous avons introduit quatres approches originales : – Une approche basée sur le critère de dispersion minimum qui consiste à minimiser la somme des dispersions des observations blanchies. L’étape de pré-blanchiment des observations est basée sur une nouvelle matrice de covariance normalisée que nous avons introduite. – Une deuxième approche basée sur l’idée des statistiques normalisées que nous avons introduite pour adapter les méthodes existantes basées sur les statistiques d’ordre deux ou d’ordre supérieur. – Une troisième approche en utilisant des fonctions de contrastes, sous contrainte d’orthogonalité, basées sur des fonctionnelles sous- ou suradditives. En particulier, nous avons proposé un critère qui consiste à minimiser la somme des normes Lp (p ≥ 1) des observations pour séparer des sources qui peuvent être éventuellement à variance infinie. – Une quatrième approche de structure semi-paramétrique. Dans cette méthode nous formulons le problème de séparation de source sous forme d’un problème d’estimation par le principe du maximum de vraisemblance. Par la suite, nous combinons une version stochastique de l’algorithme EM et l’approximation des PDF α-stables par les fonctions logspline afin d’estimer la PDF et la matrice du mélange simultanément. xv II- Estimation de signaux FM non-stationnaires multicomposantes dans un environnement impulsif : La littérature reste relativement pauvre dans le cas multicomposantes en présence de bruit impulsif α-stable. Pour contribuer à la résolution de ce problème, nous avons proposé des méthodes paramétriques et d’autres non-paramétriques basées sur l’analyse temps-fréquence : – Méthodes paramétriques : Nous commençons par ramener le problème à celui de l’estimation de signaux harmoniques noyés dans un bruit impulsif grâce à une transformée polynomiale du signal. Une méthode haute résolution (de type MUSIC) est alors appliquée au signal ainsi transformé pour l’estimation des paramètres de la phase. Trois cas de figures sont considérés et comparés : (i) Celui de l’application directe de l’algorithme MUSIC au signal harmonique tronqué ; (ii) celui de l’application de l’algorithme MUSIC à l’estimée robuste de la fonction de covariance du signal harmonique et (iii) celui de l’application de MUSIC à la covariation généralisée du signal. – Méthodes non-paramétriques : Dans une première approche, nous avons appliqué la procédure de robustesse au sens minimax d’Huber contre l’effet du bruit impulsif sous forme d’une étape de pré-traitement par deux techniques différentes à savoir : (i) technique de compression des amplitudes par un filtre non-linéaire de type |x|β ; 0 < β < 1 et (ii) la technique de troncature du signal en amplitude. Par la suite nous représentons le signal dans le plan temps-fréquence en utilisant des transformées quadratiques adéquates au cas multicomposantes et un algorithme de séparation pour extraire les composantes et estimer leurs fréquences instantanées. Par contre, dans la deuxième approche, nous avons appliqué la méthode M-estimation robuste directement à la transformée temps-fréquence quadratique pour définir une transformée robuste à l’effet du bruit impulsif et des termes croisés d’un signal multicomposantes. Finalement, une étude numérique vient compléter les résultats théoriques et permet de comparer nos approches à d’autres méthodes existantes dans la littérature. xvi Publications 1- Articles de Journals 1. M. Sahmoudi, H. Monod, D. Makowski et D. Wallach,” Optimal experimental designs for estimating model parameters, applied to yield response to nitrogen models,” Agronomie, vol. 22, pp. 229–238, 2002. 2. M. Sahmoudi, K. Abed-Meraim and M. Benidir, “Blind Separation of Impulsive alpha-stable Sources Using a Minimum Dispersion Criterion”, in IEEE Signal Processing Letters Journal, vol. 12, No.4, April, 2005. 3. M. Sahmoudi and K. Abed-Meraim ”Blind Separation of Instantaneous Mixtures of Impulsive α-Stable Sources based on Fractional Lower-Order Statistics”, submitted for publication in IEEE Transaction on Signal Processing. 4. M. Sahmoudi and K. Abed-Meraim ”Blind Separation of Heavy-Tailed Sources Using Normalized Statistics”, submitted for publication in IEEE Transaction on Signal Processing. 5. M. Sahmoudi, K. Abed-Meraim and B. Barkat ”Robust Estimation of Multicomponent Non-Stationary FM Signals in Heavy-Tailed Noise”, submitted for publication in IEEE Transaction on Signal Processing. 2- Articles de Conferences 1. M. Benidir and A. Ouldali and M. Sahmoudi, ”Performances Analysis For The HAF-Estimator For A Time-Varying Amplitude Phase-Modulated Signals,” in Proceeding of CA 2002, IASTED International Conference in Control and Applications, Cancun, Mexico, May 20-22, 2002. 2. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ” Blind Separation of Alpha-Stable Sources : A new Fractional Lower-Order Moments (FLOM) Approach”, in Proceeding of the ISSPIT02 ; the IEEE International Symposium on Signal Processing and Information Technology, December Marrakech, Morocco, December 18-21, 2002. 3. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Estimation de Signaux Chirp Multicomposantes Affecte par un Bruit Impulsif Alpha-stable”, in Proceeding of GRETSI Paris, France, Septembre 2003. 4. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Blind Separation of Instantaneous Mixtures of Impulsive alpha-stable Sources”, in Proceeding of the xvii 5. 6. 7. 8. 9. 10. 11. 12. 13. IEEE International Symposium on Signal and Image Processing and Analysis, Rome, Italy, September 2003. M. Sahmoudi, K. Abed-Meraim, N. Linh-Trung, V. Sucic, F. Tupin and B. Boashash, ”An Image and Time-frequency Processing Methods for Blind Separation of Non-stationary Sources”, in Proc. of Journées d’Etude sur les Méthodes pour les Signaux Complexes en Traitement d’Image, INRIA Recquencourt, Paris France, 9-10 décembre 2003. M. Sahmoudi and K. Abed-Meraim, ”Multicomponent Chirp Interference Estimation For Communication Systems In Impulsive alpha-stable noise Environment”, in Proceeding of the IEEE International Symposium on Control, Communications and Signal Processing, Hammamet, Tunisia, Mars 2004. M. Sahmoudi and K. Abed-Meraim, ”Robust IF Estimation of Multicomponent FM Signals Affected by Heavy-Tailed Noise Using TFD”, Int. Colloquium of Modelization, Stochastic and Statistics MSS-2004, Alger, Algeria, Avril 2004. M. Sahmoudi, K. Abed-Meraim and B. Barkat, ”IF Estimation of Multicomponent Chirp Signal in Impulsive alpha-stable noise Environment Using Parametric and Non-Parametric Approaches”, in Proceeding of EUSIPCO 2004, 12th European Signal Processing Conference, Vienna, Austria, September 2004. M. Sahmoudi, K. Abed-Meraim and M. Benidir, ”Blind Separation of HeavyTailed Signals Using Normalized Statistics”, in Proceeding of ICA 2004. 5th International Conference on Independent Component Analysis and Blind Source Separation, Granada, Spain, September 22-24, 2004. M. Sahmoudi and K. Abed-Meraim, ”Robust Blind Separation Algorithms for Heavy-Tailed Sources”, to appear in Proceeding of ISSPIT’2004 ; the fourth IEEE Symposium on Signal Processing and Information Technology, Rome, Italy, December 18-21 2004. M. Sahmoudi, K. Abed-Meraim, M. Lavielle, E. Kuhn and Ph. Ciblat, ”Blind Source Separation Using a Semi-Parametric Approach with Application to Heavy-Tailed Signals” submitted to EUSIPCO’2005, Turkey, Sep. 2005. M. Sahmoudi and K. Abed-Meraim, ”A Robust Time-frequency Distribution for the Analysis of Multicomponent Non-stationary FM Signals Affected by Impulsive α-stable Noise”, submitted to SSP’2005, Bordeaux, France, July 2005. M. Sahmoudi and K. Abed-Meraim ”Blind Sources Separation Using Contrast Functions based on Some sub- and super- Additive Functionals”, submitted to ISSPA’2005, Sydney, Australia, Sept. 2005. xviii 1 Notations et abréviations Tous au long de ce document, les notations et les abréviations classiques suivantes seront utilisées : diag(a1 , · · · , an ) i.i.d. v.a. v.a.r. BD EEG EVD FM FT FLOS HOS IF IFT LFM MBD PDF SNR SOS TFSP TF TFD WVD Matrice diagonale d’élements diagonaux a1 , · · · , an indépendants et identiquement distribués variable aléatoire variable aléatoire réelle B distribution electroencephalogram eigenvalue decomposition frequency–modulated Fourier transform fractional lower-order statistics higher–order statistics instantaneous frequency inverse Fourier transform linear frequency–modulated modified B distribution probability density function signal–to–noise ratio second–order statistics time–frequency signal processing time–frequency time–frequency distribution Wigner–Ville distribution M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 1 Introduction Ce chapitre introductif a une double finalité. La première est de préciser le cadre de la thèse et les deux problèmes au quelle a tenté de résoudre. Tandis que la seconde consiste à présenter les principales contributions de ce travail en indiquant le fil directeur reliant ses deux parties. 1.1 Motivations Ce travail trouve son origine et sa motivation dans le besoin croissant de caractériser, d’analyser et de traiter des signaux non-stationnaires [Suppappola(2003)] et/ou non-gaussiens [Wegman et al.(1989)], [Kassam(1995)]. Le développement des méthodes de traitement du signal a donné naissance à un ensemble de techniques dont l’objectif principal est d’éclairer une situation d’application donnée. Avec la complexification des situations réelles, par exemples rupture de transmission [Tourneret(1998)], phénomènes impulsifs [Nikias et Shao(1995)], panne de capteurs [Kassam(1995)], canal non stationnaire [Ikram et al.(1998)], signal non stationnaire, effet Doppler, besoin d’instruments de mesure de plus en plus fins,... etc, les outils de traitement du signal se spécialisent et deviennent moins flexibles : pour s’adapter à une situation particulière, les procédures d’étude et d’analyse doivent être modifiées fréquemment. Lors de sa pratique professionnelle, le traiteur du signal a l’occasion de constater qu’il est parfois très éloigné du cadre théorique strict dans lequel certains outils ou méthodes de traitement du signal fonctionnent. Il est confronté à des données manquantes, erronées, incomplètes, tronquées, l’hypothèse de normalité n’est pas vérifiée, l’hypothèse de stationnarité n’est pas vérifiée,..etc. Face à ces difficultés, il ne dispose en général que de sa propre expérience et, guidé également par son intuition, il essaye de ”façonner” de façon empirique des outils adaptés au problème 4 Introduction qui lui est soumis. Le traiteur du signal se trouve donc souvent dans l’obligation de choisir entre plusieurs clés pour ouvrir une serrure, dont aucune ne correspond exactement à la serrure en question. Afin de le guider, la statistique mathématique a établi les propriétés de telle ou telle méthode dans un contexte bien spécifié, en général décrit par un modèle probabiliste donné. Cette modélisation n’est qu’une représentation quelque peu simplifiée de la réalité du phénomène étudié. En effet, le recours à la loi normale n’est parfois que la conséquence d’un acte de foi ou la reconnaissance de l’impossibilité de trouver le ”vrai” mécanisme probabiliste engendrant les observations. En plus, on rencontre des systèmes non-stationnaires de façon quasi permanente dans la nature du fait de la dynamique et de l’évolution rapide des systèmes étudiés. Plusieurs époques ont été distinguées dans l’évolution chronologique des méthodes de traitement du signal. La présente phase ne peut être résumée si simplement, aussi préférons-nous parler de l’époque ”des méthodes statistiques fondées sur peu d’hypothèses”. Les applications du traitement du signal conduisent donc tout naturellement à l’étude des signaux non-gaussiens, des signaux non-stationnaires et de la robustesse des méthodes de traitement dans ces deux classes de signaux. 1.1.1 Non-gaussianité En effet, pendant très longtemps, le développement des méthodes statistiques et l’étude de leurs propriétés ont été fondés essentiellement sur la gaussianité1 de la famille de lois. Cela transparaı̂t clairement, par exemple, dans toute l’approche de R. Fisher, et dans la méthode des moindres carrés. Néanmoins, le choix d’un modèle statistique régi par la loi normale relève plus de l’acte de foi que d’une réflexion rigoureuse. Sur le plan théorique, le développement récent de la statistique mathématique est dominé par la recherche de solutions dans un contexte où la validité d’un modèle n’est pas assurée, et où seront faites des hypothèses limitées sur la loi de probabilité. Sur le plan pratique, dans de nombreux problèmes de communication, tels que la transmission sur le réseau électrique, les communications HF ou bien les communications sous-marines [Grigoriu(1995)], [Kassam(1995)], [Nikias et Shao(1995)], l’hypothèse classique sur la nature gaussienne du bruit, justifiée par le biais du théorème central-limite, n’est plus valide. En effet, dans de tels systèmes, des bruits à faible probabilité d’apparition mais à très fortes amplitudes dits de nature impulsive, ou des problèmes de discontinuité du comportement du bruit (problème de rupture) interviennent et ne peuvent plus être représentés par des lois gaussiennes. De tels phénomèmes peuvent en fait être modélisés à l’aide de distributions non-gaussiennes à décroissance algébrique, c’est-à-dire, en x−α avec 0 < α < 2 1 Pour faire référence à Carl Friedrich Gauss : né en 1777 à Runswick en Allemagne. Il devient très rapidement un astronome et mathématicien renommé, si bien qu’actuellement il est toujours considéré comme l’un des plus grands mathématiciens de tous les temps, au même titre qu’Archimède et Newton. Ses contributions à la science et en particulier à la statistique sont de très grande importance. On lui doit notamment la méthode des moindres carrés et le développement de la loi normale pour les problèmes d’erreurs de mesure. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.1 Motivations 5 [Nikias et Shao(1995)], [Ilow(1995)], [Kuruoglu(1998)] ayant ainsi le même comportement que les distributions α-stables. C’est la raison pour laquelle nous nous intéressons à des modèles de distributions généraux incluant certes les modèles gaussiens mais aussi des lois de type queue lourdes. Pour illustration, le modèle gaussien est bien approprié dans le cas des données à bande limité. Tandis que dans le cas des données à large bande, un modèle stable à variance infinie doit être utilisé comme il est présenté dans la figure 1.1. Signal α−stable: α=1.2 Signal Gaussien 4 40 20 Signal SαS(t) Signal G(t) 2 0 0 −20 −40 −60 −2 −80 −4 0 50 100 150 200 −100 0 50 (a) Temps t 100 150 200 30 40 (b) Temps t 3 4 2 1 Signal SαS(t) Signal G(t) 2 0 −1 −2 0 −2 −3 −4 0 10 20 30 40 −4 0 10 20 (d) Temps t (c) Temps t Fig. 1.1: Réalisations d’un signal Gaussien et celles d’un signal α-stable. • Figures (c) et (d) : Lorsque la taille de l’échantillon est relativement petite, les deux réalisations de la loi gaussienne et de la loi α-stable sont semblables. • Figures (a) et (b) : Lorsque la taille de l’échantillon est relativement large, les deux réalisations se diffèrent clairement 1.1.2 Non-stationnarité De tous les outils dont on peut disposer en traitement du signal, l’analyse spectrale est certainement l’un des plus importants. Les raisons de son excellence sont évidemment à chercher dans la relative universalité du concept majeur sur lequel elle repose : celui de fréquence. Que ce soit dans des domaines s’intéressant à des ondes physiques (acoustique, vibrations, géophysique, optique,...) ou reposant M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6 Introduction sur certaines périodicités d’événements (économie, biologie, astronomie,...) une description fréquentielle est souvent à la base d’une plus grande intelligence des phénomènes mis en jeu, en fournissant un complément indispensable à la seule description temporelle (sortie de capteur ou suite d’événements), qui est généralement première pour l’analyse. Si l’on ajoute à cela que l’approche fréquentielle s’accommode aussi d’une transposition au traitement spatial (imagerie acoustique, radioastronomie,...), on comprend qu’un très grand nombre d’études aient été et soient encore consacrées à l’analyse spectrale. On dispose ainsi aujourd’hui d’un arsenal de méthodes dont, au moins pour les plus simples et les plus robustes (et donc les plus éprouvées), les propriétés sont bien connues. À ces méthodes s’ajoutent une batterie d’algorithmes, de logiciel, de processus, voir d’appareils, autant d’élements assurant à l’analyse spectrale une place de choix dans la vie quotidienne des laboratoires. Cependant, c’est l’expérience de cette même vie quotidienne qui nous contraint à fixer des limites de validité, mais surtout à présenter des objections de principe, à la notion classique... La non-stationnarité est une non-propriété. Pour la définir, on va d’abord expliquer ce qu’est la stationnarité [Flandrin(1993)]. La notion de stationnarité est reliée naturellement à celles de régime établi, de stabilité temporelle. La définition utilisée en théorie du signal formalise d’une certaine manière ces idées. Si on parle de signaux certains, on pourra dire d’eux qu’ils sont stationnaires s’ils peuvent se décomposer en une somme d’ondes sinusoı̈dales éternelles (on retrouve ici les modes de Fourier du physicien). Les signaux aléatoires stationnaires sont ceux pour lesquels il n’existe pas d’origine temporelle. Par conséquent, leurs propriétés statistiques (leurs moments) ne varient pas au cours du temps. Est non-stationnaire, tout ce qui n’est pas stationnaire : les transitoires, i.e., lorsqu’on n’est pas encore parvenu à un régime permanent (par ex. dans une voiture, phase d’accélération avant d’atteindre une vitesse stable), les ruptures, i.e, les modifications brutales et intempestives d’amplitude (par ex. : dans une voiture, panne de moteur, coup de frein brutal). La classe des signaux non-stationnaires comprend une grande variété de signaux comme la sous classe des signaux modulés en fréquence appelés signaux FM [Amin(1992)], [Cohen(1995)] et en particulier les signaux à phase polynomiale que l’on rencontre souvent en télécommunications, notamment dans les signaux de type radar ou sonar [Ouldali(1999)], [Boashash(2002)]. Pour le traitement des signaux non stationnaires, au-delà des méthodes spectrales ”classiques” adaptées aux situations stationnaires, les années quatre-vingt ont vu le développement d’un grand nombre d’approches ”modernes” qui ont toutes un point commun : la prise en compte explicite du temps comme paramètre de description. Dans un context d’analyse spectrale, ceci a conduit naturellement au concept d’analyse temps-fréquence et à ses représentations et/ou modélisation associées. L’intensification des travaux sur le sujet et leur floraison dans des directions souvent différentes a certainement rendu assez difficile notre tâche de choix et puis d’utilisation des méthodes existantes. La figure 1.1.2, représente quelques signaux réels dans le plan temps-fréquence. Contrairement au deux représentations classiques (temporelle et fréquentielle) d’un M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.1 Motivations 7 signal, on voit clairement l’évolution de la fréquence au cours du temps, d’où le besoin de telle représentation pour l’analyse des signaux non-stationnaires. 1.1.3 Robustesse Le XIX-ème siècle a vu un long débat sur le traitement des points aberrants (”outliers”), et apparaı̂tront très tôt dans la littérature statistique des références à cette dérive particulière des hypothèses de base. Le terme ”robuste” a été cité pour la première fois dans un article de G. Box [Box(1953)] sur l’estimation de variance dans le cas non-gaussien, au sens de résistance à une déviation par rapport à la loi normale. Par la suite, de nombreux auteurs se sont intéressés aux propriétés de certaines alternatives aux estimateurs classiques, dans le cadre de lois contaminées, ou de mélanges de lois : on dit qu’une loi P est contaminée par une loi Q au taux ², ² ∈ [0, 1], si la loi des observations est (1 − ²)P + ²Q. Diverses définitions du concept de robustesse ont été avancées dans la littérature statistique [Launer et Wilkinson(1979)], [Huber(1981)], [Leroy(1987)]. Lorsque P désigne la loi du modèle statistique, une procédure a été qualifiée de robuste si : – elle admet une grande efficacité absolue pour toutes les alternatives à P ; – elle admet une grande efficacité absolue sur un ensemble bien spécifié de lois ; – elle est peu sensible à l’abandon des hypothèses statistiques sur lesquelles elle est fondée ; – la loi de la statistique sur laquelle est basée cette procedure doit ”peu varier” lorsque P est soumise à de petites altérations. Ces différentes définitions ne fournissent pas une vue exhaustive de la question, mais proviennent toutes du même esprit. Ainsi P. Huber, dans [Huber(1972)], écrit : ”la robustesse est une sorte d’assurance : je suis prêt a payer une perte d’efficacité de 5 à 10 % par rapport au modèle idéal pour me protéger de mauvais effets de petites déviations de celui-ci : je serai bien sûr heureux que ma procédure statistique fonctionne bien sous de gros écarts, mais je n’y prête pas réellement attention car faire de l’inférence à partir d’un modèle aussi faux n’a que peu de signification concrète”. En conclusion, nous retiendrons la définition suivante du concept de robustesse : une procédure statistique sera robuste si ses performances sont peu modifiées par de faibles modifications des hypothèses statistiques sur lesquelles elle est fondée comme par exemple la loi P , modèle des observations. Cette approche est celle de [Huber(1981)] et [Hampel et al.(1986)]. Cette définition de la robustesse sera précisée -dans chacune des deux parties de cette thèse- sur deux points : – quelles performances de la procédure faut-il retenir ? – qu’est-ce qu’une faible modification du modèle de base ? Plus généralement, ce problème consiste à se placer non pas dans le cadre d’une seule loi de probabilité, mais dans une vaste classe de lois. Cette approche permet de répondre à de nombreuses questions auxquelles le traitement du signal classique n’apporte de solutions que dans un contexte gaussien [Lecoutre et Tassi(1980)]. Par exemple, la séparation des composantes de sources dans un environnement nongaussien impulsif [Sahmoudi et al.(2004a)], [Zhang et Kassam(2004)], l’estimation M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8 Introduction des paramètres d’un signal éventuellement non-stationnaire noyé dans un bruit non-gaussien [Friedmann et al.(2000)], [Sahmoudi et al.(2004b)] et la détection multiutilisateurs dans un environnement non-gaussien [Poor et Tanda(2002)]. 1.2 Position du Problème Réduire l’effet du bruit additif et séparer des mélanges de sources sont deux problèmes fondamentaux et récurrents dans la plupart des applications en traitement du signal et de l’image [Kay(1998b)], [Hyvarinen et al.(2001)]. C’est d’autre part deux problèmes théoriques centraux en théorie statistique que ce soit pour des objectifs d’estimation ou de détection [Kay(1998a)], [Kay(1998b)]. Le cas où le signal, bruit ou signal source, est de nature impulsive s’avère particulièrement intéressant à la fois sur le plan théorique comme l’inférence statistique des processus α-stable [Samorodnitsky et Taqqu(1994)] et sur le plan pratique comme la réduction de l’effet du bruit atmosphérique en communications HF et l’effet des valeurs abbérantes sur les procédures de traitement statistique d’un signal observé [Nikias et Shao(1995)], [Kassam(1995)]. C’est aussi un cas mal ou peu étudié dans la littérature du traitement de signal relativement au cas standard où le signal est supposé de loi gaussienne. Le but de ce travail est de développer des techniques d’estimation et de séparation dans des milieux présentant des phénomènes impulsifs se caractérisant par des processus admettant des distributions à décroissance lente, appelées également, à queues lourdes et en particulier les distributions α-stables [Nikias et Shao(1995)], [Adler et al.(1998)] . 1.2.1 Séparation de sources impulsives à variance infinie La séparation aveugle de sources est une technique de traitement des signaux (ou images) multicapteurs dans laquelle on postule qu’une séquence d’observations x(t), t = 1, · · · , T est modélisée par x(t) = As(t) + b(t), t = 1 . . . T (1.1) où A est une matrice m × n à rang plein, s(t) est un n-vecteur de sources dont les composantes sont indépendantes et b(t) représente un éventuel bruit additif. La séparation aveugle de sources, ou encore l’analyse en composantes indépendantes, est un problème qui consiste à retrouver des signaux sources s(t) statistiquement indépendants à partir de leurs observations x(t) (leurs mélanges) reçus sur le réseau de capteurs et cela sans connaissance a priori de la structure des mélanges ou des signaux sources [Hyvarinen et al.(2001)]. La séparation de sources intervient dans des applications diverses telles que la localisation et la poursuite de cibles en radar et sonar, la séparation de locuteurs (problème dit de “cocktail party”), la détection et la séparation dans les systèmes de communication à accès multiple, l’analyse en composante indépendante de signaux biomédicaux (e.g., EEG, ECG M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.2 Position du Problème 9 et fMRI),..[Cichocki et Amari(2002)],etc. Ce problème a été largement étudié et de nombreuses solutions ont été proposées. Il s’agit de méthodes mettant en oeuvre la minimisation d’un critère de séparation ; certaines sont algébriques et font appel à des statistiques d’ordre deux et/ou d’ordre supérieur [Cichocki et Amari(2002)], [Hyvarinen et al.(2001)]. D’autre utilisent des outils d’optimisation comme les algorithmes adaptatifs ou de type bloc basés sur une décomposition parcimonieuse. D’autres encore exploitent l’indépendance statistique des sources par le biais du principe du maximum de vraisemblance ou encore en utilisant la théorie de l’information (principe de ”l’infomax”) [Cichocki et Amari(2002)], [Hyvarinen et al.(2001)]. Le problème de séparation d’un mélange linéaire instantané, est arrivé a une certaine maturité mais il reste peu étudié pour certains cas statistiquement ardus. Lorsque des observations présentent des changements brusques traduisant l’apparition d’événements significatifs modélisées par des lois α-stables, les méthodes classiques ne s’appliquent plus ou sont mal adaptées. En effet, malgré leurs différences, il s’avère que la plupart de ces méthodes utilisent les statistiques d’ordre deux et/ou d’ordre supérieur ou la densité de probabilité des sources ce qui est indéfini dans le cas des sources α-stable. L’objectif de la première partie de cette thèse est de proposer des méthodes statistiques pour la séparation de sources impulsives de modèle α-stable. Nous nous focalisons ici sur les deux points suivants : • Signaux sources impulsifs : Si les sources s(t) sont impulsives c’est-à-dire telles que les probabilités des valeurs extrêmes ne sont pas négligeables, le modèle de séparation de sources peut être plus réaliste en considérant une distribution à queue lourde comme les lois α-stables (0 < α < 2) pour modéliser les sources. Il s’agit d’une famille paramétrique de distributions de probabilité très flexible pour prendre en compte les caractéristiques statistiques (caractère exponentiel, symétrie, dispersion et position ) de la distribution des observations des phénomènes à grandes variations d’échelle [Kidmose(2001)], [Shereshevski(2002)]. Dans cette partie on traitera l’exploitation des statistiques fractionnaires d’ordre inférieure (FLOS) et l’adaptation des méthodes existantes pour séparer des mélanges α-stables. • Généralisation : On s’intéressera également à la généralisation de l’utilisation des FLOS dans le problème de la séparation d’autres classes de sources. Cela nous a permis d’introduire des techniques de séparation de sources fondamentalement différentes de celles existantes qui ne font intervenir que les statistiques d’ordre deux (SOS) ou d’ordres supérieurs (HOS). 1.2.2 Estimation de signaux FM multicomposantes dans un environnement impulsif Dans cette deuxième partie de la thèse, nous traitons les signaux multicomposantes affectés par un bruit additif non-gaussien de nature impulsive. Un signal M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 10 Introduction FM est dit multicomposantes si sa representation temps-fréquence, présente des crêtes multiples dans le plan temps-fréquence. Analytiquement, un signal est dit multicomposantes si on peut l’écrire comme somme de signaux monocomposantes. Le signal FM multicomposantes bruité considéré dans cette partie est donné par le modèle suivant x(t) = s(t) + z(t) = M X si (t) + z(t) (1.2) (1.3) i=1 avec – si (t) : désigne la i-ème composante du signal x(t). Elle est de la forme si (t) = ai (t) ejφi (t) et elle est supposée à une seule crête seulement, ou une seule courbe continue, dans le plan temps-fréquence. – ai (t) : désigne l’amplitude de la i-ème composante si (t) du signal x(t). – φi (t) : désigne la phase de la i-ème composante si (t) du signal x(t). Lorsque la phase φi (t) est un polynôme de degré I, on dit que le signal si (t) est un signal FM à phase polynomiale. Dans ce cas ) ( I X bi,k tk (1.4) si (t) = ai (t) exp j k=0 – z(t) : désigne le bruit impulsif, modélisé par des lois à queue lourdes (”heavytailed”). À titre d’exemple de ce genre de lois de probabilités, que l’on utilisera pour valider nos approches, nous considérons la famille des lois α-stables avec α < 2 [Samorodnitsky et Taqqu(1994)] et la famille des densités de probabilité des lois gaussiannes généralisées [Kay(1998a)]. Les signaux FM et en particulier les signaux à phase polynomiale (SPP) se rencontrent souvent en télécommunications, notamment dans les signaux de type radar ou sonar [Cohen(1995)], [Suppappola(2003)]. Ces signaux modélisent une vaste gamme de signaux non-stationnaires puisqu’ils ont des caractéristiques fréquentielles qui évoluent continuellement au cours du temps avec des vitesses de variation qui peuvent être importantes [Boashash(2002)]. Nous nous intéressons au problème d’estimation de la fréquence instantanée de chaque composante si (t) du signal FM (12.1), définie par [Boashash(1992a)] 4 IFi (t) = 1 dφi (t) 2π dt (1.5) Plusieurs solutions existent déjà dans la littérature dans les cas mono-composante et multi-composantes en présence d’un bruit gaussien [Francos et Friedlander(1995)], [Francos et Porat(1999)], [Ouldali(1999)], [Davy et al.(2002)]. Étant donné le caractère fortement non-stationnaire, les signaux à phase polynomiale ne peuvent être traités par des techniques développées sous l’hypothèse stationnaire comme le périodogramme, la méthode de Prony, Music,...D’autres part les techniques adaptatives basées sur l’hypothèse de stationnarité locale du signal à l’intérieur de la M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.3 Objectifs et Contributions 11 fenêtre d’analyse sont peu efficaces pour étudier ces signaux dont la fréquence instantanée peut évoluer rapidement. Par la suite l’analyse de ces signaux nécessite une approche qui prend en compte explicitement ce caractère non-stationnaire. C’est pourquoi l’analyse conjointe en temps et en fréquence a été introduite. En dépit de l’intérêt que suscitent les signaux FM non-stationnaires en traitement du signal et de la théorie qui s’est développée depuis une trentaine d’années, il reste bien des problèmes à résoudre surtout en ce qui concerne le cas multicomposante dans un bruit non-gaussien. Notre travail de recherche, dans cette deuxième partie, s’est alors axé dans deux directions : • Cas multi-composantes : on ne trouve dans la littérature que peu de méthodes efficaces d’analyse dans le cas multi-composantes c’est-à-dire constituées de somme des signaux mono-composante. En effet, les techniques existantes sont souvent des adaptations ou des extensions des méthodes de traitement des signaux mono-composante. • Bruit impulsif : Supposer que le bruit b(t) est de loi Gaussienne dans la majorité des algorithmes d’estimation de la fréquence instantanée qui existent dans la littérature peut se révéler dramatique pour certaines applications dans lesquelles le bruit peut être impulsif, ou constitué de sources nuisibles que l’on ne cherche pas à estimer. Une alternative à ce problème est de modéliser le bruit par les distributions α-stable permettant de prendre en compte une structure non-gaussienne du signal bruit [Cappé et al.(2002)]. La difficulté majeure réside dans la détermination de la contribution de chaque composante au niveau des observations du signal et dans la réduction de l’effet du bruit impulsif. 1.3 Objectifs et Contributions L’objectif principal est d’utiliser des théories et techniques existantes et de développer de nouvelles techniques pour le traitement des signaux de nature nongaussienne (impulsive), et/ou non-stationnaire. Plus précisément, le travail de cette thèse de doctorat se situe au carrefour des deux grandes problématiques suivantes dans le contexte d’environnement impulsif (bruit ou signaux sources) : [A]- Séparation aveugle des mélanges linéaires instantanés des sources impulsives Ce problème a été peu étudié pour certains cas statistiquement ardus. En effet, lorsque les sources sont modélisées par des lois α-stables, les méthodes classiques ne s’appliquent plus, car la densité de probabilité n’a pas d’expression analytique explicite et les moments d’ordre supérieur ou égal à 2 sont infinis. Dans ce cas, nous avons introduit quatres approches originales : – Une approche basée sur le critère de dispersion minimum qui consiste à minimiser la somme des dispersions des observations blanchies. L’étape de M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12 Introduction pré-blanchiement des observations est basée sur une nouvelle matrice de covariance normalisée que nous avons introduite. – Une deuxième approche basée sur l’idée des statistiques normalisées a été proposée pour adapter les méthodes existantes basées sur les statistiques d’ordre deux ou d’ordre supérieur. – Une troisième approche en utilisant des fonctions de contrastes, sous contrainte d’orthogonalité, basées sur des fonctionnelles sous- ou sur-additives. En particulier, nous proposons un critère qui consiste à minimiser la somme des normes Lp des observations après une étape de blanchiment pour séparer des sources éventuellement à variance infinie. – Une quatrième approche de structure semi-pramétrique. Dans cette méthode nous formulons le problème de séparation de source sous forme d’un problème d’estimation par le principe du maximum de vraisemblance. Par la suite, nous combinons une version stochastique de l’algorithme2 EM et l’approximation des PDF α-stables par les fonctions log-spline afin d’estimer la PDF et la matrice du mélange. [B]- Estimation de signaux FM multicomposantes dans un environnement impulsif La littérature reste relativement pauvre dans le cas multicomposante et en particulier dans le cas de bruit impulsif α-stable. Pour contribuer à la résolution de ce problème, nous avons proposé des méthodes paramétriques et d’autres nonparamétriques basées sur l’analyse temps-fréquence. – Méthodes paramétriques : Nous commençons par ramener le problème à celui de l’estimation de signaux harmoniques noyés dans un bruit impulsif grâce à une transformée polynomiale du signal. Une méthode haute résolution (MUSIC) est alors appliquée au signal ainsi transformé pour l’estimation des paramètres. Trois cas de figures sont considérés : (i) Celui de l’application directe de l’algorithme MUSIC au signal harmonique tronqué en amplitude ; (ii) celui de l’application de l’algorithme MUSIC à l’estimée robuste de la fonction de covariance du signal harmonique et (iii) celui de l’application de MUSIC à la covariation généralisée du signal. – Méthodes non-paramétriques : Dans une première approche, nous avons appliqué la procédure de robustesse au sens minimax d’Huber contre l’effet du bruit impulsif sous forme d’une étape de pré-traitement par deux techniques différentes à savoir : (i) technique de compression des amplitudes par un filtre non-linéaire de type |x|β ; 0 < β < 1 et (ii) la technique de troncature du signal à partir d’une large valeur. Par la suite nous représentons le signal dans le plan temps-fréquence en utilisant des transformées quadratiques adéquates au cas multicomposantes et un algorithme de type ad-hoc pour extraire les composantes et estimer leurs fréquences instantanées. Par contre dans la deuxième, nous avons combiné l’approche de robustesse Mestimation avec les transformées temps-fréquence quadratiques pour définir 2 Le terme algorithme vient de la prononciation latin du nom de Abu Ja’far Muhammad Ibn Mus Al-Khawarizmi, mathématicien arabe du XI-ème siècle vivant à Bagdad et précurseur de l’algébre [Khawarizmi(ecle)] M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.4 Organisation du Document 13 des transformées robustes à l’effet du bruit impulsif et des termes croisés d’un signal multicomposantes. Finalement, une étude numérique vient compléter les résultats théoriques et permet de comparer nos approches à d’autres méthodes existantes dans la littérature. Notons aussi que les deux problématiques traitées dans cette thèse sont très riches et attire de plus en plus les spécialistes du traitement du signal. Nous citons par exemple une nouvelle contribution qui traite le problème de la séparation aveugle des mélanges convolutifs des signaux FM [Castella et al.(2004)], ce qui est une combinaison des deux problèmes abordé dans ce travail. 1.4 Organisation du Document Conscient que l’aspect abstrait des probabilités et statistiques rebute beaucoup de traiteur du signal, nous avons voulu presenter un exposé vivant, clair et illustré par de nombreux exemples, figures et shémas. Ce document est constitué de la présente introduction, de trois parties illustrant les différents aspects de nos travaux et d’une conclusion. Nous avons ajouté, en début de chaque chapitre, une introduction détaillant, plus encore, le contexte et les enjeux de la partie traité dans le dit chapitre, ainsi que les travaux effectués. Chaque chapitre se termine par une étude de robustesse des contributions, de leurs performances et des éventuels prolongements que l’on pourrait envisager de donner à cette méthode. De plus, les tables de matières accompagnent les trois parties de la thèse. Plus précisement, ce rapport de thése est organisé comme suit : Introduction • Chapitre 1 : Présente les motivations et l’originalité de ce travail de thèse, précise le cadre technique des problèmes posés et résume nos contributions principales. Première partie : Préliminaires • Chapitre 2–4 : Constituée de trois chapitres, réunissent les notions utiles pour la suite de la famille α-stables des distributions de probabilités non-Gaussiennes, d’estimation robuste ainsi que l’outil temps-fréquence pour l’analyse des signaux non-stationnaires. Le lecteur y trouvera toutes les définitions, théorèmes et formules qu’il doit savoir pour la compréhension du manuscrit. Deuxième partie : Contributions novatrices en séparation aveugle de sources impulsives de modèle alpha-stable • Chapitre 5 : Une présentation générale de la séparation aveugle de sources ainsi que les grands principes des méthodes existantes sont rappelés. Par M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 14 Introduction la suite, nous précisons le problème que l’on aborde : la séparation aveugle d’un mélange instantané linéaire de sources impulsives modélisées par des distributions alpha-stables. • • • • Ensuite, nous introduisons trois nouvelles approches pour le cas des sources impulsives de modèle α-stable : Chapitre 6 : Approches basées sur les moments statistiques fractionnaires d’ordre inférieure. Nous proposons une fonction de contraste basée sur le critère du dispersion minimum. Chapitre 7 : Approche de séparation par des fonctions de contrastes sous contrainte d’orthogonalité. Nous proposons dans ce chapitre deux classes de fonctions de contrastes basées sur des fonctionnelles sous- ou sur- additives. Des exemples pratiques de fonctions de contrastes sont introduits pour application aux sources à queue lourde. En particulier, nous proposons la fonction de contraste qui consiste à minimiser la somme des normes Lp des observations. Chapitre 8 : Approche basée sur les statistiques normalisés. Dans ce chapitre nous introduisons des statistiques normalisées dans le but de pouvoir appliquer correctement les méthodes de séparation de sources basées sur l’existence des statistiques d’ordre deux et d’ordre supérieur. Chapitre 9 : Approche semi-paramétrique du principe du maximum de vraisemblance basée sur la combinaison d’une version stochastique de l’algorithme EM et d’une technique d’approximation des densités α-stable par les fonctions log-splines. Troisième partie : Contributions novatrices en séparation et éstimation des signaux FM non-stationnaires afféctés par un bruit impulsif • Chapitre 10 : Nous commençons cette partie par une présentation générale des grandes approches paramétriques et non-paramétriques temps-fréquence existantes dans la littérature. • Chapitre 11 : Dans ce chapitre, nous présentons trois approches paramétriques robustes à l’effet du bruit alpha-stable pour l’estimation des signaux FM à phase polynomiale multi-composantes. • Chapitre 12 : Dans ce chapitre, nous introduisons deux approches non paramétriques basées sur la représentation temps-fréquence des signaux FM non-stationnaires considérés dans un environnement impulsif de modèle alphastable. Conclusion et Perspectives • Chapitre 13 : A la fin de ce manuscrit une conclusion vient résumer les apports essentiels du présent travail ainsi que les directions futures de recherche qu’on envisage. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 1.4 Organisation du Document Fs=1Hz N=7000 15 WHALE SIGNAL Time−res=120 7000 6000 Time (secs) 5000 4000 3000 2000 1000 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 7 8 9 0.5 (a) Signal de Baleines Fs=20Hz N=600 b−Distribution Time−res=5 30 25 Time (seconds) 20 15 10 5 1 2 3 4 5 6 Frequency (Hz) 10 (b) Signal Electroenphalogram Fs=1Hz N=400 BAT SIGNAL Time−res=8 400 350 300 Time (secs) 250 200 150 100 50 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 0.5 (c) Signal de Chauve-Souris Fig. 1.2: Exemples de signaux non-stationnaires. (a–c) Représentent des signaux d’applications de la vie réelle par la Bdistribution : (a) pour un signal de Baleines, (b) pour un signal electroencephalogram et (c) pour un signal de Chauve-souris. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 16 Introduction M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Première partie Outils pour le Traitement des Signaux non-Gaussiens et/ou non-Stationnaires L’objectif de cette partie est d’introduire un certain nombre de concepts en statistiques et en traitement du signal qui ont servi comme outils de base pour achever le travail de cette thèse, et qui seront fréquemment utilisés par la suite dans ce document. Chapitre 2 Distributions à Queues Lourdes La proprièté de stabilité, le théorème de la limite centrale et la caractérisation parfaite par les moments d’ordre un (la moyenne) et d’ordre deux (la variance ou la covariance) sont des propriètés qui font de la loi gaussienne une des lois les plus utilisées en modélisation statistique. Cependant, bien que les calculs d’inférence statistique soient simples, l’hypothèse de gaussianité s’avère trop restrictive en particulier dans certains domaines pour lesquels il faut prendre en compte une plus grande variabilité des données. Dans le cadre des distributions non-gaussiennes à variance infinie sont apparues les lois α-stables , dont le moment d’ordre 2 est infini dès que α est strictement inférieur à 2. Ces lois sont utilisées dans de nombreux domaines tels que les télécommunications [Bestravos et al.(1998)], le traitement du signal [Nikias et Shao(1995)] et la finance [Bassi et al.(1998)], [Rachev(2003)] etc... Elles font partie de la classe des lois de probabilités non-gaussiennes à queue lourde1 qui englobent d’autres modèles existant dans la litérature et qui ont attiré l’attention de beaucoup de chercheurs en statistique et en traitement du signal. Le but de ce chapitre n’est pas de faire une description exhaustive des modèles nongaussiens, il s’agit seulement d’introduire ceux qui sont particulièrement adaptés à la modélisation des phénomènes impulsifs. On présente plus en détail la famille des distributions α-stables. Le seul fait que les lois stables ont une queue de type lourde ou bien asymptotiquement parétienne (pour faire référence à la loi de Pareto) ne suffit pas pour justifier leur importance. Il existe deux raisons profondes, la première provient d’un théorème que nous verrons dans ce chapitre dit théorème central limite généralisé et qui accorde bien le statut ”lois limite” aux lois αstables. La deuxième raison provient de la proprièté de stabilité qui affirme que toute combinaison linéaire de v.a.r. α-stables est aussi de loi α-stable. 1 Formellement, une v.a.r. a une queue lourde si elle a une queue algébrique : il existent c, α > 0 tel que P r(| X |> x) ∼ cx−α , quand x → ∞. 20 Distributions Non-Gaussiennes à Queues Lourdes Aprés un bref rappel historique sur les lois stables, leurs distributions univariées sont définies et diverses propriétés sont présentées dans un premier temps. Puis, sont abordés le problème du test d’une variance finie ou infinie ainsi que l’estimation des deux paramètres caractérisant une loi symétrique α-stable. Dans une seconde section, le cas multivarié est traité. Certains concepts de mesure de dépendance des v.a.r. α-stables telles que la covariation, le coefficient de covariation, le coefficient de covariation symétrique et la codifférence sont introduits ainsi que leurs propriétés. On termine l’étude des lois α-stables par la présentation de quelques techniques d’approximations analytiques de leur densité de probabilité. Enfin, on présente d’autres modèles non-gaussiens à queue algébrique largement utilisés pour la modélisation des signaux impulsifs. 2.1 Bref Historique Au cours des développements historiques en astronomie au 18-ème siècle, Gauss a introduit sa méthode d’estimation par le critére du moindre carré et insista sur l’importance de la loi qui porte actuellement son nom [Gauss(1963)]. Suivi par les développements de la théorie des séries de Fourier, Laplace et Poisson tentent de trouver l’expression analytique de la transformée de Fourier (TF) d’une densité de probabilité (PDF) et lancent alors la théorie des fonctions caractéristiques sur la bonne voie. Laplace, en particulier, a souligné le fait que la densité de Gauss et sa TF ont la même expression analytique. Son étudiant Cauchy étend l’analyse de Laplace et R considèrenla TF d’une fonction de ”Gauss généralisée” de la 1 ∞ forme fn (x) = π 0 exp(−ct ) cos(tx)dt, en replaçant 2 par n. Il n’a pas réussi a résoudre le problème mais quand il a considéré le cas n = 1, autre que la loi de Gauss, il a obtenu la fameuse loi de Cauchy f1 (x) = π(c2C+x2 ) . En remplaçant l’entier naturel n par le réel α on obtient la fameuse famille fα des densités α-stables. Cependant, à l’époque on ne savait pas qu’il s’agit d’une densité de probabilité et c’est seulement après les travaux de Pólya et Bernstein que la famille fα est devenu officiellement une classe de PDF pour 0 < α ≤ 2 [Janicki et Weron(1994)]. En 1925, le mathématicien Français Lévy, en étudiant le théorème limite centrale, confirme que lorsqu’on relâche la condition de variance finie, la loi limite est une loi stable [Lévy(1925)]. Motivé par ce dernier résultat Lévy établit la TF de toutes les distributions α-stable, ce qui lui attribue l’originalité de la théorie des lois stables. Plus tard en 1937, Lévy a introduit une nouvelle approche pour le traitement des lois stables qui est celle des distributions infiniment divisibles. D’autres mathématiciens ont contribué plus tard à l’étude approfondie des lois stables, notablement de Doblin (1939) en utilisant les fonctions à variations regulière, de Gnedenko et Kolmogorov et de [Zolotarev(1966)]. Quelques années plus tard, [Fama et Roll(1968)] donnent les premières tabulations des lois symétriques α-stables (SαS), ce qui va permettre de concevoir les premiers estimateurs de ces lois. Plus tard, les efforts des statisticiens sont focalisés sur l’estimation de l’exposant caractéristique α qui caractérise la loi et qui détermine si la loi est de variance finie ou infinie. [Fama et Roll(1971)] ont utilisé les quantiles pour esM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.1 Bref Historique 21 timer le paramètre α ce qui permis aux premiers tests d’apparaı̂tre du modèle i.i.d. α-stable. De nouvelles techniques d’estimation basées sur la fonction caractéristique vont apparaı̂tre dans les années 80 comme par exemple la méthode de [Koutrouvelis(1980)] qui semble être la meilleure méthode selon plusieurs études faites par [Akgiray et Lamoureux(1989)] et [Walter(1994)]. Simultanément, des générateurs de variables aléatoires stables sont conçus par [Chambers et al.(1976)], dont les algorithmes permettent une amélioration des possibilités de simulation des situations réelles comme par exemple sur les marchés financiers ou le bruit télephonique. Suivit par les travaux de Paulauskas dans le cas multivatriés, [Cambanis et Miller(1981)] ont établit la théorie des processus linéaires de lois stables, [Samorodnitsky et Taqqu(1994)] ont développé la régression linéaire et non-linéaire des distributions α-stables et l’étude des processus stochastiques stables dans [Janicki et Weron(1994)]. Malgrés cette longue histoire de recherche scientifique, les lois α-stables n’attirent que peu d’attention des chercheurs en sciences appliquées : – En Astronomie : La première application des distributions α-stables est apparue avant Lévy dans le domaine de l’astronomie, quand Holtsmark a montré que la force gravitationnelle exercé par le système stellaire sur un point de l’univers a une distribution α-stable d’indice α = 3/2. – En Finance : Si on regarde par exemple les courbes boursières représentant l’évolution du prix d’un titre au cours du temps, des périodes hautes s’altérnent à des périodes basses et ainsi de suite. De plus, des fluctuations et des périodes irrégulières peuvent être observées. Mandelbrot s’appuie alors sur la loi de Pareto pour mettre en évidence un nouveau modèle de variation des prix, appelé lois α-stables. [Mandelbrot(1963)] confirme que son modèle décrit de façon réaliste la variation des prix pratiqués sur certaines bourses des valeurs. Par la suite, [Fama(1965)] valide le modèle des lois α-stables sur le prix du marché des actions. A la fin des années 80, plusieurs travaux semblent rejeter le modèle i.i.d. α-stable en se retournant vers la remise en question de l’hypothèse d’indépendance ce qui a conduit à la découverte des lois d’échelle ou lois à longue dépendance. – En Télécommunications : Les premiers travaux effectués pour l’application des lois α-stables en traitement du signal ont vu le jour durant les années 70 par trois chercheurs des laboratoires de BELL (Chambers, Mallow et Stuck) en prouvant que le modèle α-stable est bien adéquat pour modéliser le bruit des lignes téléphoniques. Ils ont conduit une série de travaux qui ont abouti a plusieurs résultats de références comme le critère de dispersion minimum , filtrage de Kalman des processus α-stable et l’analyse de plusieurs algorithmes d’estimation et de détection dans un bruit non-gaussien [Stuck et Kleiner(1974), Stuck(1977)]. En 1993, Shao et Nikias ont publié dans IEEE Magazine un article qui a initialisé la méthodologie de traitement du signal dans un environnement α-stable. Plus tard, l’intérêt à ce thème devient publique et plus de 120 articles de revue et de conférence sont apparus en plusieurs applications de ce modèle. D’autres applications sont beaucoup plus récentes, en internet par exemple le temps d’apparition d’une M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 22 Distributions Non-Gaussiennes à Queues Lourdes page web est très variable, ce qui rappelle certains modèles à variance infinie. Dans ce context, [Adler et al.(1998)] donnent divers exemples d’application des lois à queues lourdes et en particulier les distributions α-stables. Par ailleurs, en 1999 une conférence internationale sur le sujet ”Applications of Heavy-Tailed Distributions in Statistics, Engineering and Economics” était organisée. Quelques mois plus tard, durant la conférence ”IEEE Higher Order Statistics Workshop”, une session spéciale était consacré au sujet. En 2000, la conférence ICASSP aussi consacre une session spéciale au sujet. Récemment en 2002, un numéro spécial du journal ”Signal Processing” est dédié aux modèles à queue lourdes et leurs applications en radar, images, video et en analyse des données télétrafiques (No. 82, 2002). – D’une Manière Générale : Notons toutefois que même si le modèle i.i.d. α-stable n’est pas toujours approprié, il représente un bon compromis entre exactitude de modélisation et compléxité d’inférence statistique. Plusieurs livres sont consacrés à ces lois : [Zolotarev(1986)] qui a étudié les lois αstables dans le contexte univarié ; [Samorodnitsky et Taqqu(1994)] qui ont étudié de manière approfondie beaucoup de propriétés de ces lois dans le cas univarié comme dans le cas multivarié, [Nikias et Shao(1995)] qui ont appliqué ces lois dans le domaine du traitement du signal et [Nolan(2004)] pour une étude de point du vue implémantation et modélisation des données. En dépit de l’intérêt que représente cette famille de distributions, il reste bien beaucoup de questions à creuser surtout dans le cas multivarié. Notre travail de recherche s’est alors axé dans le traitement des signaux impulsifs modélisés par des lois α-stables. 2.2 Lois Stables Univariées 2.2.1 Lois indéfiniment divisibles Avant de définir les lois α-stables, nous allons introduire une famille de lois plus générale : les lois indéfiniment divisibles. C’est à partir de ces lois que sera précisée la forme de la fonction caractéristique des lois stables. L’importance de telles lois réside dans la solution du problème suivant : Déterminer la classe des distributions qui s’expriment comme limite d’une somme de n variables aléatoires réelles (v.a.r.) indépendantes et identiquement distribuées (i.i.d.) ? Pour résoudre le problème, introduisons alors la définition suivante. Définition 2.1. Une v.a.r. X a une distribution indéfiniment divisible si et seulement si d ∀n, ∃X1 , · · · , Xn indépendantes et de même loi telles que X = X1 + · · · + Xn d où = signifie l’égalité en distribution. Il faut noter que les v.a.r. Xi n’ont pas forcèment la même loi que X mais elles appartiennent à la même classe de distributions. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.2 Lois Stables Univariées 23 La classe des v.a.r. indéfiniment divisible permet de résoudre le problème cidessus. En effet, on a le théorème suivant. Théorème 2.1. Une v.a.r. X est la limite d’une somme de v.a.r. i.i.d. si et seulement si X est indéfiniment divisible. Pour la démonstration voir [Shiryayev(1984), page 336]. Remarque 2.1. Une des caractérisations des lois indéfiniment divisibles est que leur fonction caractéristique peut s’exprimer comme puissance n-ème d’une autre fonction caractéristique. Théorème 2.2 (Levy-Khinchin). Si X a une distribution indéfiniment divisible, alors sa fonction caractéristique s’écrit ¾ ½ Z +∞ itx e − 1 − it sin x M (dx) ΦX (t) = exp iµt + x2 −∞ où µ est un réel et M est une mesure qui attribue une masse finie à tout intervalle fini et telle que les deux intégrales suivantes Z +∞ Z −x + −2 − M (x) = y M (dy) et M (−x) = y −2 M (dy) x −∞ sont convergentes pour tout x > 0. Pour la démonstration voir [Feller(1971), page 554]. Pour se rapprocher du théorème de la limite centrale et afin d’obtenir une forme explicite de la fonction caractéristique, nous allons introduire la famille des distributions α-stable. 2.2.2 Deux définitions équivalentes des distributions α-stables Définition 2.2 (Propriété de Stabilité). La distribution d’une v.a.r. X est stable si pour tout suite ak ; k ∈ IN∗ de nombres réels et toute famille X1 , · · · , Xk i.i.d. de même loi que X, il existe ck > 0 et bk , deux réels, tels que d a1 X1 + · · · + ak Xk = ck X + bk Lorsque bk = 0, on parle de distribution strictement stable. Théorème 2.3. Pour toute v.a. stable X, il existe une constante α, 0 < α ≤ 2, telle que la constante ck vérifie : cαk = aα1 + · · · + aαk Le nombre α est appelé exposant caractéristique ou bien indice de stabilité. Dans le cas k = 2, la démonstration est détaillée dans [Samorodnitsky et Taqqu(1994)]. La généralisation au cas k ∈ IN∗ est évidente. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 24 Distributions Non-Gaussiennes à Queues Lourdes Proposition 2.1. Si X est stable alors, X est indéfiniment divisible. Preuve X − bn On considère les v.a. Yj = jan n , j = 1 · · · n Les v.a. Yj sont indépendantes car les Xj le sont. On peut écrire Y1 + · · · + Yn = X1 + · · · + Xn bn − an an d d comme X1 + · · · + Xn = an X + bn , d’où Y1 + · · · + Yn = X. ¥ Théorème 2.4 (Théorème Central Limite Généralisé). Sans limitation de variance finie, pour toute suite de variables aléatoires i.i.d. X1 , · · · , Xn , suite an de nombre réels positifs et suite bn nombres réels, la somme normalisée (X1 + · · · + Xn ) + bn an converge en distribution vers une variable stable. La démonstration est détaillée dans [Shiryayev(1984), page 338]. On peut également définir les distributions α-stables à partir de leurs fonctions caractéristiques. Définition 2.3. : Fonction caractéristique des lois stables (Levy-Khinchin) Si X a une distribution stable, alors sa fonction caractéristique s’écrit : Φ(t) = exp{iat − γ | t |α [1 + jβsign(t)ω(t, α)]} (2.1) où ½ ω(t, α) = tan απ , si α 6= 1 2 2/π log | t | , si α = 1 (2.2) Une loi stable notée Sα (a, β, γ) est caractérisée par quatres paramètres : – α : l’exposant caractéristique, 0 < α ≤ 2. Il caractérise les queues de distribution en mesurant leurs épaisseurs. C’est pourquoi on parle des distributions α-stable à queues lourdes ou à queue épaisse . Quand α est proche de 2, la probabilité d’observer des valeurs de la variable aléatoire loin de la position centrale est faible. Une valeur proche de 0 de l’indice α signifie que la masse de la queue a une probabilité considérable. La valeur α = 2 correspond à la loi normale (loi de Gauss) pour toute valeur de β, alors que α = 1, β = 0 correspond à la loi de Cauchy ; – a : paramètre de position. Il mesure la tendance centrale de la distribution. Lorsque α > 1, a représente la moyenne et si 0 < α < 1, alors a représente la médiane ; – γ : la dispersion, mesure la dispersion de la distribution autour du paramètre de position a. Lorsque α = 2, la variance existe et γ = 12 V ar(X) ; M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.2 Lois Stables Univariées 25 – β : paramètre de symétrie, −1 ≤ β ≤ 1. Si β = 0, la loi est symétrique par rapport au paramètre de position a, de fonction caractéristique φα (t) = exp{iat−γ|t|α }. Dans ce cas la loi de probabilité est dite α-stable symétrique ou tout simplement SαS. Les distributions α-stable symétrique représente une sous classe importante des distributions α-stable. Par exemple la loi de Gauss et la loi de Cauchy sont des lois SαS. Par convention, une loi α-stable est dite standard si a = 0 et γ = 1. Enfin, reste à noter aussi qu’il est assez courant dans la littérature de remplacer la dispersion γ par σ α et d’appeler σ paramètre d’échelle. Pour donner une comparaison à la loi de Gauss, nous présentons dans la figure 2.1 des réalisations de variables aléatoires i.i.d. symétriques α-stables d’exposants α = 0.1, 0.5, 0.8, 1, 1.5 et une réalisation gaussienne. On remarque que plus α est petit, plus la variable est impulsive. 26 6 4 x 10 5 α=0.1 α=0.5 SαS(t) SαS(t) 4 2 0 −2 0 50 100 150 −500 100 150 200 100 150 200 0 −200 0 50 100 4 α=1.5 150 200 α=2 2 10 0 −10 50 200 SαS(t) SαS(t) 20 0 α=1 0 50 −5 400 α=0.8 0 0 −10 200 SαS(t) SαS(t) 500 −1000 x 10 0 −2 0 50 100 Temps t 150 200 −4 0 50 100 Temps t 150 200 Fig. 2.1: Réalisations de signaux α-stables pour différentes valeurs de α. 2.2.3 Stabilité de quelques lois usuelles Proposition 2.2 (loi de Gauss). La loi Gauss N (m, σ 2 ) est une loi indéfiniment divisible et α-stable de paramètre α = 2. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 26 Distributions Non-Gaussiennes à Queues Lourdes Preuve – Indéfiniment divisible : sa fonction caractéristique s’écrit ½ ¾ t2 σ 2 Φα (t) = exp imt − 2 ¾¸n · ½ 2 m 2σ 2 = exp i t − t n n comme puissance nème de la fonction caractérististique d’une loi σ2 normale N ( m n , n ). – Stabilité : la loi N (m, σ 2 ) est une loi S2 (m, β, σ 2 /2). Réciproquement, une loi S2 (a, β, γ) est une loi normale N (a, 2γ). ¥ Proposition 2.3 ( Loi de Cauchy). La loi de Cauchy C(a) est une loi indéfiniment divisible et α-stable de paramètre α = 1. Preuve – Indéfiniment divisible : sa fonction caractéristique s’écrit Φα (t) = exp(−a|t|) h a in = exp(− |t|) n comme puissance nème de la fonction caractérististique d’une loi de cauchy C( na ). γ – Stabilité : la loi de Cauchy généralisée de densité f (x) = π1 γ 2 +(x−m) 2 est une loi S1 (m, 0, γ). ¥ Proposition 2.4 (Loi de Poisson). La loi de Poisson P(λ) est une loi indéfiniment divisible mais n’est pas stable. Preuve – Indéfiniment divisible : la fonction caractéristique de P(λ) s’écrit 1 (1 − itλ )r !n à 1 = r (1 − itλ ) n ) Φα (t) = comme puissance nème da la fonction – P(λ) n’est pas stable : nous proposons une demonstration par l’absurde. On considère deux v.a. X1 et X2 de poisson, s’ils sont stables alors il existe a > 0 d et b tels que X1 + X2 = aX1 + b. ½ IE(X1 + X2 ) = IE(aX1 + b) ⇒ V ar(X1 + X2 ) = V ar(aX1 + b) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.2 Lois Stables Univariées ½ ⇒ 2λ = aλ + b ⇒ 2λ = a2 λ 27 ½ b = (2 − a = √ 2)λ √ 2 les v.a. X1 et X2 ne prennent que des √ valeurs dans √ IN et donc X1 + X2 aussi, cela entraı̂ne une contradiction car 2X1 + (2 − 2)λ n’est pas forcèment à valeurs dans IN. ¥ 2.2.4 Propriétés des lois stables Dans cette partie, les propriétés les plus importantes des lois α-stables seront présentées. [A]- Densité de probabilité Pour les v.a. α-stables, il n’existe pas une expression explicite de la densité de probabilité (PDF) dans le cas géneral. Cependant on peut obtenir une expression sous forme d’une intégrale de la PDF à l’aide de la transformée de Fourier inverse de la fonction caractéristique f (x; α, β) = = Z +∞ 1 exp(−itx)Φα (t)dt 2π −∞ Z 1 +∞ exp(−tα ) cos[xt + βtα ω(t, α)]dt π 0 Quand la distribution représentée par cette densité est symétrique (β = 0) autour de zéro (a = 0), la fonction caractéristique est une fonction réelle et paire, ce qui permet de simplifier l’expression de la densité de probabilité Z 1 +∞ exp(−γ|t|α ) cos(tx)dt f (x; α, β) = π 0 Proposition 2.5 (Propriétés de la densité). 1. La densité de probabilité vérifie : f (x; α, β) = f (−x; α, −β). 2. La densité de probabilité d’une distribution α-stable est une fonction bornée. 3. La densité de probabilité d’une distribution α-stable est de classe C ∞ . Pour la démonstration, voir [Zolotarev(1986)]. La forme explicite de la densité des lois α-stables n’existe que dans les trois cas importants suivants : 1. La loi de Gauss S2 (a, 0, γ) : ½ ¾ 1 (x − a)2 √ α = 2, β = 0 =⇒ f (x; 2, 0) = exp − 4γ 4πγ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 28 Distributions Non-Gaussiennes à Queues Lourdes 2. La loi de Cauchy S1 (a, 0, γ) : α = 1, β = 0 =⇒ f (x; 1, 0) = π(γ 2 γ + (x − a)2 ) 3. La loi de Lévy S 1 (a, 1, γ) : 2 ½ ¾ 1 1 1 γ γ2 α = , β = 1 =⇒ f (x; , 1) = √ − 3 exp 2 2 2(x − a) 2π (x − a) 2 qui est concentrée sur [a, ∞). 0.7 α=0.5 α=1 α=1.5 α=2 0.6 Densité de probabilité p(x) 0.5 0.4 0.3 0.2 0.1 0 −6 −4 −2 0 x 2 4 6 Fig. 2.2: Densité de probabilité α-stables pour différentes valeurs de α. [B]- Propriétés algébriques Proposition 2.6. Soit X1 ∼ Sα (a1 , β1 , γ1 ) et X2 ∼ Sα (a2 , β2 , γ2 ) deux v.a. αstables et indépendantes, alors X1 + X2 ∼ Sα (a, β, γ) tels que a = a1 + a2 , β = β1 γ1 +β2 γ2 et γ = γ1 + γ2 . γ1 +γ2 Proposition 2.7. Soit X ∼ Sα (a, β, γ) une v.a. α-stable et c une constante réelle, alors X + c ∼ Sα (a + c, β, γ) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.2 Lois Stables Univariées 29 Proposition 2.8. Soit X ∼ Sα (a, β, γ) une v.a. α-stable et h une constante réelle non nulle, alors hX ∼ Sα (ha, sign(h)β, |h|α γ) si α 6= 1 1 2 α hX ∼ S1 (ha − π h ln |h|γ α β, sign(h)β, |h| γ) si α = 1 Pour la démonstration, voir [Samorodnitsky et Taqqu(1994)]. [C]- Comportement queues lourdes Définition 2.4. La loi de probabilité d’une v.a.r. est dite à queue lourde d’indice α s’il existe un nombre α ∈]0, 2[ et une fonction h à variation lente, c’est-à-dire h(bx) lim = 1 pour tout b ∈ IR+ tels que : x→+∞ h(x) IP(X ≥ x) = x−α h(x) (2.3) Proposition 2.9. Soit X une v.a.r. de loi Sα (a, β, γ) avec 0 < α < 2, alors on a les deux résultats suivants 1+β lim tα IP(X > t) = Cα γ, t→+∞ 2 (2.4) 1 − β lim tα IP(X < −t) = Cα γ t→+∞ 2 où Cα est une constante qui ne dépend que de α : µZ Cα = ∞ −α x ¶−1 ½ sin xdx = 0 2/π 1−α Γ(2−α)cos(πα/2) , si α = 1, si α = 6 1 Pour la démonstration voir [Samorodnitsky et Taqqu(1994), page 16]. D’aprés cette propriété (2.4) par passage à la limite quand x tend vers +∞, on remarque que les lois α-stables sont asymptotiquement à queue lourde. Pour une meilleure illustration des densités α-stables, nous avons présenté les courbes de leurs densités de probabilité et de leurs queues pour différentes valeur de α dans la figure 2.2 et la figure 2.3 respectivement. Ces figures montrent l’effet de l’exposant caractéristique α. Nous remarquons que plus α est petit, plus la densité est impulsive et sa queue est lourde. [D]- Propriété de mélange Théorème 2.5. (Théoème de mélange d’échelles) Soit x ∼ Sαx (0, γx , 0) avec 0 < αx < 2 et soit 0 < αz ³< αx . Soit y une v. a. ´ αx /αz , 0 z )) totalement ”skewed” de distribution alpha-stable Sαz /αx −1, (cos( πα 2αx et indépendant de x. Alors z = y1/αx x ∼ Sαz (0, γx , 0). (2.5) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 30 Distributions Non-Gaussiennes à Queues Lourdes 0.04 α=0.5 α=1.0 α=1.5 α=2.0 0.035 La queue de la densité de probabilité 0.03 0.025 0.02 0.015 0.01 0.005 0 2 2.5 3 3.5 4 x 4.5 5 5.5 6 Fig. 2.3: Les queues de la densité de probabilité α-stable pour différentes valeurs de α. Pour la démonstration voir [Samorodnitsky et Taqqu(1994)] et [Feller(1971)]. Ce théorème nous permet d’écrir une v.a. SαS comme produit de deux v.a. αstable dont l’une est totalement ”skewed”. Corollaire 2.1. Soient x un de loi normal ³ vecteur ´ N (0, 2γx ) et y une v.a. positive ¢ ¡ παz 2/αz de loi α-stable ; y ∼ Sαz /2 −1, cos( 4 ) , 0 et indépendant de x. Alors z = y1/2 x ∼ Sαz (0, γx , 0) (2.6) Ce cas spécial du théorème 2.5 montre qu’une v.a. SαS peut être représentée comme produit d’une v.a. gaussienne et d’une v.a. α-stable positive. Cette proprièté montre que les lois SαS sont des distributions gaussiennes conditionnelles [Papoulis(1991)]. 2.2.5 Moments fractionnaires d’ordre inférieur [A]- Moments fractionnaires d’ordre positif Même si les moments du second ordre d’une v.a. SαS avec 0 < α < 2 n’existent pas, les moments d’ordre inférirur à α existent et s’appellent les moments fractionM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.2 Lois Stables Univariées 31 naires d’ordre inférieur (FLOM). La proposition suivante donne l’expression des FLOM en fonction de la dispersion γ et de l’exposant caractéristique α. Proposition 2.10. Soit X une v.a. Sα (0, β, γ) ; de paramètre de position nul et de dispersion γ. Alors – Si α = 2 : ∀p ≥ 0, IE|X|p < +∞ – Si α < 2 : ½ p C(α, β, p)γ α 0 < p < α, IE|X|p = (2.7) +∞ p ≥ α. ¡ ¢p ¡ ¢¢ ¡ 2p−1 Γ(1−p/α) 2α cos p arctan β tan απ où C(α, β, p) = R +∞ 1 + β 2 tan2 απ 2 −p−1 2 α 2 p 0 u sin udu et Γ(.) représente la fonction gamma. Ce résultat important a été démontré par Zolotarev en utilisant la transformée de Mellin-Stieljes [Zolotarev(1986)]. Dans [Cambanis et Miller(1981)], le même résultat a été retrouvé en utilisant une proprièté de la fonction caractéristique. Un résultat similaire est vrai dans le cas des v.a. stables complexes [Masry et Cambanis(1984)]. [B]- Moments fractionnaires d’ordre négatif Dans [Ma et Nikias(1995a)], les auteurs ont démontré que les v.a.r. SαS ont aussi des moments finis d’ordre négatif !. Ce résultat surprenant pour les lois αstables symétriques SαS est présenté dans la proposition suivante. Proposition 2.11. Soit X une v.a.r. SαS de paramètre de position nul et de dispersion γ. Alors la formule unifiée pour ces moments d’ordre positif et d’ordre négatif est p (2.8) IE(|X|p ) = C(p, α)γ α pour tout −1 < p < α. avec C(p, α) = 2p+1 2.2.6 p )Γ( 1+p ) Γ(− α 2 √ α πΓ(− p2 ) . Simulation des lois stables [A]- Sources de codes Pour simuler les lois stables, Chambers et al. ont publié le premier programme en langage FORTRAN dans [Chambers et al.(1976)]. Le même code était amelioré par Chambers et J. Nolan est publié dans le livre [Samorodnitsky et Taqqu(1994)]. Il existe aussi une fonction rstab dans la bibliothèque du logiciel S-PLUS. Pour un programme MATLAB, on peut consulter la page web du professeur John NOLAN. [B]- Quelques exemples Nous avons simulé 5000 réalisations de lois SαS pour différentes valeurs de α. Le tableau suivant (Tableau 2.1) représente la moyenne et la variance empirique des 5000 réalisations. Ces résultats confirment l’équation sur le calcul des moments. En effet, lorsque α décroı̂t vers 1, la variance diverge et lorsque α devient plus petit que 1, c’est la moyenne qui commence à diverger. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 32 Distributions Non-Gaussiennes à Queues Lourdes Valeur de α IE(X) V ar(X) 0.5 5324.87 3323423.23 0.9 27.12 3312198.76 1 -0.48 2171.12 1.2 0.01 152.13 1.5 0.03 36.76 1.7 2 0.02 0.02 6.27 2.12 Tab. 2.1 – La moyenne et la variance des lois α-stables pour différentes valeurs de α 2.3 Inférence Statistique des Lois Stables La première tâche du traiteur de signal est consacrée à l’étude de la modélisation des données par des loi de probabilités. En particulier, dans cette section nous étudierons l’adéquation de la famille des lois α-stable pour cette modélisation [Nolan(2004)]. Plusieurs niveaux de tests et de validation sont possibles pour différentes classes de signaux (e.g., images biomédicales, images astronomiques, signaux EEG, etc). Dans un premier temps, on pourra tester si la distribution des données est à queue lourde en utilisant l’histogramme de la loi normale. Si l’hypothèse de normalité est violée, on testera si la variance des données est infinie en utilisant le test de convergence des variances [Adler et al.(1998)]. Ainsi, on pourra conclure si les données sont dans le domaine d’attraction par l’estimation du coefficient exponentiel α directement à partir des données et en utilisant la méthode dite “stabilized p-p plots” [Michael(1983)] . 2.3.1 Tests de la variance Nous allons présenter deux méthodes graphiques pour tester si la distribution de nos observations est à variance finie ou infinie. [A]- Test graphique de la convergence de la variance empirique La stratégie qui semble la plus simple pour tester si la variance est finie ou pas c’est de faire augmenter la taille de l’échantillon et calculer la variance empirique correspondant. Plus précisément, on propose l’algorithme résumé dans le tableau 2.2. Si les observations ont une loi à variance finie, lorsqu’on fait augmenter la taille N des observations, la variance doit converger vers une valeur finie. Dans le cas contraire, si les observations proviennent d’une loi à variance infinie, un comportement de divergence doit être observé. [B]- Test graphique de la queue L’idée principale de ce deuxième test est basée sur le comportement asympto1+β tique ”queue lourde” des lois α-stable lim tα IP(X > t) = Cα γ. Alors, cela t→+∞ 2 implique d log F̄ (x) ∼ −α; x → +∞ (2.9) d log x M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.3 Inférence Statistique des Lois Stables 33 ' $ Test graphique de la variance N X 1 Step 1. Calcul de la moyenne empirique X̄ = Xi N i=1 Step 2. Calcul de la variance empirique 2 σ̂N = 1 N N X (Xi − X̄)2 i=1 2 ) (N, σ̂N Step 3. Visualisation de la courbe assez grands. pour des N & % Tab. 2.2 – Test graphique de la variance en utilisant la variance empirique. où F̄ (x) = IP(X > x) est le complémentaire de la fonction de répartition F . Nous résumons l’algorithme dans le tableau 2.3. ' $ Test graphique log-log de la queue Step 1. Calcul du logarithme de la queue 4 q(x) = log( N 1 X 1I|Xi |>x ) N i=1 Step 2. Visualisation de la courbe assez grands. (log x, q(x)) pour des x & Tab. 2.3 – Test graphique de la queue d’une distribution par la méthode dite ”log-log”. Si la variance de la loi de distributions des données est finie la pente de la courbe doit converger vers une valeur finie [Adler et al.(1998)]. 2.3.2 Estimation des paramètres des lois α-stables La plupart des algorithmes de traitement du signal utilisant des lois α-stables exigent l’estimation a priori des paramètres de la distribution α-stable et en particulier l’exposant caractéristique α. D’où l’importance d’avoir des techniques efficaces d’estimation des paramètres de la loi. Pour une loi α-stable symétrique SαS, les paramètres de la distribution à estimer sont l’exposant caractéristique α et la dispersion γ. De nombreuses méthodes ont été proposées dans la littérature : maximum de vraisemblance [DuMouchel(1973), Bodenschatz et Nikias(1999)], utilisation des fractiles de la distribution [Fama et Roll(1968)], utilisation de la fonction caractéristique [Koutrouvelis(1980)], utilisation des moments fractionnaires d’ordre inférieur positifs et négatifs [Ma et Nikias(1995b)], utilisation des moments logarithmiques de la loi SαS [Ma et Nikias(1995b)], utilisation de la fonction de M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires % 34 Distributions Non-Gaussiennes à Queues Lourdes répartion dans [Maymon et al.(2000)] et la généralisation des méthodes existantes au cas d’une loi α-stable non symétrique [Kuruoglu(2001)]. Dans cette partie, nous allons discuter la méthode du maximum de vraissemblance et détailler la méthode basée sur la fonction caractéristique. [A]- Médode du maximum de vraisemblance Cette approche largement utilisée en statistique souffre d’une difficulté majeure dans le cas des distributions α-stable, à savoir le manque d’expression analytique de la PDF. Malgré cela, [DuMouchel(1973)] a dévoloppé une telle approche dans ce contexte. D’autres chercheurs ont utilisé des techniques de Monte Carlo ou des approximations pour approcher les intégrales de l’expression de la densité [Nolan(2004)]. Cependant, toutes ces méthodes nécessitent une grande complexité de calcul. De plus, il n’existe aucune étude de convergence de cette approche dans la littérature. [B]- Méthode de régression basée sur la fonction caractéristique Pour une v.a.r SαS, l’expression de la fonction caractéristique est donnée par ϕX (t) = exp{−γ | t |α } Ce qui entraı̂ne | ϕX (t) |2 = exp{−2γ | t |α } £ ¤ log − log | ϕX (t) |2 = log 2γ + α log | t | . £ ¤ On pose yk = log − log | ϕX (t) |2 , λ = log 2γ, ωk = log | tk | ; l’égalité précédente implique que yk = λ + αωk . Si on pose ¡ ¢ ŷk = log − log | ϕ̂X (tk ) |2 où " #2 " n #2 n X X 1 | ϕ̂X (tk ) |2 = 2 cos(tk xi ) + sin(tk xi ) n i=1 i=1 On peut alors proposer comme modèle linéaire suivant Ŷ = λ + αW + ε Or la partie imaginaire de la fonction caractéristique est nulle, on a alors l’estimateur de la fonction caractéristique donnée par : 2 | ϕˆX (tk ) | = µ Pn i=1 cos(tk xi ) n ¶2 . (2.10) En ce qui concerne le choix des tk , ainsi que le choix de K par rapport à n, on suit la démarche décrite dans [Koutrouvelis(1980)], c’est-à-dire : quelque soit k ∈ [1, K], tk = πk 25 et le paramètre K est choisi suivant le tableau 2.4 ci-dessous. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.4 Lois Stables Multivariées α n 200 800 1600 35 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 134 124 118 86 68 56 28 22 18 22 16 14 11 11 11 9 9 10 30 24 20 24 18 15 10 10 10 Tab. 2.4 – Valeurs optimales de K en fonction de n et de α – Estimation du paramètre α : par régression linéaire en choisissant les ωk tel P que K ω = 0, on obtient k k=1 PK α̂ = Pk=1 K ωk ŷk (2.11) 2 k=1 ωk – Estimation de P la dispersion γ : De même, par régression linéaire et le choix des ωk tel que K k=1 ωk = 0, on obtient 1 γ̂ = exp 2 2.4 à K 1 X ŷk K ! (2.12) k=1 Lois Stables Multivariées 2.4.1 Définition et propriétés Définition 2.5. Le vecteur aléatoire X = (X1 , · · · , Xd ) est dit α-stable dans IRd si pour toute suite de nombre positifs a1 , · · · , ak , il existe un nombre positif ck et un vecteur D(k) ∈ IRd tels que a1 X(1) + · · · + ak X(k) = ck X + D(k) d (2.13) où X(1) , · · · , X(k) sont des copies indépendantes de X [Samorodnitsky et Taqqu(1994)]. Lorsque D(k) est le vecteur nul, on parle de loi strictement alpha-stable. Proposition 2.12. Si X est un vecteur α-stable, alors toute combinaison linéaire des composantes de X est une v.a.r. α-stable. Preuve P Soit Y = ni=1 λi Xi une combinaison linéaire des composantes de X. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 36 Distributions Non-Gaussiennes à Queues Lourdes Considérons Y1 , · · · , Y2 des copies de Y . Y1 + · · · + Yk d = n X (1) λi Xi + ··· + i=1 d = n X n X (k) λi Xi i=1 ³ ´ (1) (k) λi Xi + · · · + Xi i=1 d = n X ´ ³ (k) λi ck Xi + Di i=1 d = ck n X λi Xi + j=1 d = n X (k) λi Di i=1 ck Y + bk . ¥ Contrairement au cas mono-variable, la fonction caractéristique d’un vecteur stable multi-variable n’a pas d’expression explicite en t. Définition 2.6. Une fonction caractéristique d’une v.a. de dimension n est dite α-stable si elle s’écrit sous la forme ½ T T exp(jt ¡ Ta − tR At), T n−1 α ¢ si α = 2; Φ(t) = exp jt a − S n−1 |t S | µ(dS n−1 ) + jβα (t) , si 0 < α < 2. (2.14) où – R ½ απ T n−1 |α sign(tT S n−1 )µ(dS n−1 ), si α 6= 1, 0 < α < 2 tan( ) 2 S n−1 |t S βα = R T n−1 log |tT S n−1 |µ(dS n−1 ), si α = 1. S n−1 t S (2.15) n−1 – S est la sphère unité de dimension n, – a, t ∈ IRn , – µ(.) est la mesure spectrale de la sphére unité 2 , – A est une matrice symétrique, semi-définie positive. Notons que le cas α = 2 correspond a une distribution gaussienne multivariée de moyenne a et de matrice de covariance 2A. Notons aussi qu´à l’exception de ce dernier cas α = 2, les distribution stables multivariées sont déterminés par le vecteur a ∈ IRn , un scalaire 0 < α < 2 et une mesure finie µ(dS n−1 ) sur la sphère unité S n−1 . Définition 2.7. Un vecteur x est dit de distribution α-stable symétrique (SαS) si x est un vecteur α-stable et si les distributions de −x et x sont identiques. Théorème 2.6. Soit x un vecteur α-stable, on a les résultats suivants. 2 C’est une mesure sur l’ensemble des boréliens de la sphére unité M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.4 Lois Stables Multivariées 37 1. Si toute combinaison linéaire des composantes de x a une loi symétrique α-stable, alors x est un vecteur SαS. 2. Si toute combinaison linéaire des composantes de x a une distribution αstable, avec un indice de stabilité α ≥ 1, alors x est un vecteur α-stable. La démonstration est détaillée dans [Samorodnitsky et Taqqu(1994), page 59]. Proposition 2.13. Soit A une matrice de type m × n et x un vecteur SαS de dimension n, alors y = Ax est un vecteur SαS de dimension m. Preuve D’après le théorème 2.6, il suffit de montrer que toute combinaison linéaire des composantes de y a une distribution SαS. En effet, soient b1 , · · · , bm , m réels et x un vecteur SαS, alors nous avons m X bj Yj ∼ SαS ⇐⇒ bt y ∼ SαS j=1 =⇒ bt Ax ∼ SαS =⇒ Ãm n X X j=1 ! bi aij Xj ∼ SαS. i=1 Or le vecteur x a une distribution SαS, alors la dernière combinaison ci-dessus a une distribution SαS. Par conséquent, le vecteur y est un vecteur SαS. ¥ Remarque 2.2. Blanchiment des SαS Il est montré dans plusieurs ouvrages de traitement statistique du signal qu’on peut blanchir tout vecteur de distribution Gaussienne. Précisément, si x est un vecteur Gaussien alors on peut l’écrire sous la forme x = Ay où A est une matrice constante et y est un vecteur Gaussien à composantes indépendantes. Cependant, dans le cas des lois stables, la représentation de deux variables stables de même exposant caractéristique α, 0 < α < 2, comme combinaison linéaire d’un nombre fini de variables stables indépendantes est impossible en général [Schilder(1970)]. Ce résultat remarquable nous impose de faire attention lors de la généralisation de certaines propriètés des lois Gaussiennes au cas des lois stables au sens de Lévy. 2.4.2 Moments des lois stables multivariées Le calcul des moments des lois stables multivariées découle de celui des lois stables univariées. Théorème 2.7. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 38 Distributions Non-Gaussiennes à Queues Lourdes 1. Si X1 , · · · , Xn sont des des v.a.r. α-stables et indépendantes, alors IE (|X1 |p1 · · · |Xn |pn ) < ∞ si et seulement si pi < α; i = 1, · · · , n 2. Si X1 , · · · , Xn sont des v.a.r. dépendantes et conjointement α-stables, alors IE (|X1 |p1 · · · |Xn |pn ) < ∞ si et seulement si 0 < p1 + · · · + pn < α Cette condition est trés faible et souvent réalisée dans la pratique. Pour plus de détails voir [Miller(1978)]. Dans sa forme générale, une distribution alpha-stable multi-variée reste difficile à exploiter dans la pratique du traitement du signal. Cependant il existe quelques sous-classes des distributions α-stables multi-variées avec une expression simplifiée de la fonction caractéristique. Une telle classe est celle des distributions sousGaussiennes dont la description est presentée ci-dessous. 2.4.3 Vecteur aléatoire α-sous-gaussien Définition 2.8. La fonction caractéristique des distributions α-sous-Gaussiennes est donnée par ¶ µ 1 T def α/2 Φ(t) = exp − (t Rt) 2 où R est une matrice définie positive. Cette sous-classe est souvent noté par α − SG(R) [Cambanis et Miller(1981)]. Un vecteur aléatoire de distribution α − SG(R) peut se décomposer comme produit (ou mélange) d’un vecteur aléatoire α-stable et d’un vecteur gaussien. Proposition 2.14. Soit x un vecteur α-stable ; x ∈ SG(R), alors 1 x = η2y avec η une variable aléatoire positive α2 − stable et y un vecteur Gaussien de moyenne nulle et de covariance R. En plus, η et y sont indépendantes. Pour la démonstration voir [Cambanis et Miller(1981)]. ³ ¡ ¢2/α ´ Dans la proposition précédente η ∼ Sα/2 a = 0, β = 1, γ = cos πα . Alors 4 on peut la voir comme extension du résultat de mélange des SαS. Comme on peut le voir de la formulation de la définition par la fonction carctéristique ci-dessus, les paramètres β et γ ne sont plus indépendants. Leurs valeurs peuvent être déterminées en utilisant la fonction caractéristique et la mesure spectrale. En effet, contrairement aux distributions alpha-stables monovariables qui forment une classe paramétrique, les distribution stables multi-variées forment une classe non-paramétrique. Pour plus de détails sur cette classe le lecteur intéressé peut consulter [Samorodnitsky et Taqqu(1994)]. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.5 Mesure de Dépendance des v.a.r. α-Stables 2.5 39 Mesure de Dépendance des v.a.r. α-Stables Le coefficient de corrélation est la mesure classique de dépendance (à l’ordre 2) entre deux v.a.r. X1 et X2 de variances finies. Cependant, pour les lois alphastables, les moments d’ordre p avec p ≥ α et 0 < α < 2 sont infinis et en particulier la variance. Par conséquent, le coefficient de corrélation n’est plus valable en tant que mesure de dépendance. Dans ce cas, d’autres mesures existent dans la littératue utilisant les moments fractionnaires d’ordre inférieur à α comme la covariation, la codifférence et le coefficient de covariation symétrique et d’autres basées sur les rangs ou sur les densités de probabilités. Dans cette section, nous allons présenter les plus connues et les plus utilisés pour mettre en évidence certaines particularités surprenantes des lois stables concernant la structure de dépendance. 2.5.1 Covariation Nous supposons dans cette partie que 1 < α ≤ 2. Définition 2.9. Soit (X1 , X2 ) un vecteur SαS avec α strictement supérieur à 1, la covariation de X1 sur X2 est définie par la quantité Z [X1 , X2 ]α = x1 x<α−1> dµS 1 (x1 , x2 ) (2.16) 2 S1 où S 1 est la sphère unité, µS 1 est la mesure spectrale et .<.> designe la notation suivante x<a> = sign(x) | x |a . Nous présentons une autre définition de la covariation équivalente à la précédente qui permettra de démontrer facilement plusieurs propriétés. Proposition 2.15. Soit (X1 , X2 ) un vecteur SαS avec α strictement supérieur à 1, la covariation de X1 sur X2 peut s’écrire [X1 , X2 ]α = 1 ∂γ(θ1 , θ2 ) |θ1 =0,θ2 =1 α ∂θ1 (2.17) avec γ(θ1 , θ2 ) est le paramètre de dispersion de la variable aléatoire θ1 X1 + θ2 X2 . Preuve La démonstration se fait aisément en se rappelant que Z γ(θ1 , θ2 ) = | θ1 x1 + θ2 x2 |α dµS 1 (x1 , x2 ). S1 Proposition 2.16. 1. Dans le cas Gaussien (α = 2), la covariation est identique à la moitié de la covariance. 1 (X, Y ) ∼ SαS =⇒ [X, Y ]2 = Cov(X, Y ) 2 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 40 Distributions Non-Gaussiennes à Queues Lourdes 2. Si X et Y sont deux v.a.r. indépendantes et conjointement SαS alors, [X, Y ]α = 0. 3. La covariation [X, Y ]α est linéaire en X ou bien linéaire à gauche, c’est-àdire si (X1 , X2 , Y ) est un vecteur SαS alors [a1 X1 + a2 X2 , Y ]α = a1 [X1 , Y ]α + a2 [X2 , Y ]α pour toutes constantes réelles a1 et a2 . 4. En général, [X, Y ]α n’est pas linéaire à droite, c’est-à-dire par rapport à Y mais elle possède la proprièté de pseudo-linéarité suivante : si (X, Y1 , Y2 ) est un vecteur SαS et que Y1 et Y2 sont indépendantes, alors [X, b1 Y1 + b2 Y2 ]α = b<α−1> [X, Y1 ]α + b<α−1> [X, Y2 ]α 1 2 pour toutes constantes réelles b1 et b2 . Les démonstrations sont détaillées dans [Samorodnitsky et Taqqu(1994)]. 2.5.2 Métrique de covariation Définition 2.10. Soit X une v.a.r. SαS de dispersion γ et de paramètre de location a = 0. La norme de X est définie par ½ γ si 0 < α < 1 (2.18) kXkα = 1 α si 1 ≤ α ≤ 2 γ Alors, la norme kXkα est une quantité liée directement à la dispersion γ et détermine la distribution de X via la fonction caractéristique. Définition 2.11. Si X et Y sont deux v.a.r. conjointement α-stable, la distance entre X et Y est définie par dα (X, Y ) = kX − Y kα (2.19) En combinant les deux équations (6.3) et (2.7), on peut facilement remarquer que la distance dα mesure le p-ème moment de la différence des deux v.a.r. Dans le cas α = 2, cette distance est identique à la moitié de la variance de la différence des deux v.a.r. Notons aussi que la convergence en distance dα est équivalente à la convergence en probabilité [Cambanis et Miller(1981)]. Il est connu dans la théorie des statistique d’ordre deux que l’espace des v.a.r. d’un processus aléatoire à variance finie est un espace de Hilbert. Cependant, ce n’est pas le cas pour les v.a.r. α-stables mais il existe un résultat similaire. En effet, si l’on considère un processus α-stable X(t); t ∈ T , alors l’ensemble des combinaisons linéaires des variable aléatoires X(t) forment un espace linéaire noté l(X(t), t ∈ T ). Dans cet espace tous les v.a.r. sont conjointement α-stables de même exposant caractéristique α [Cambanis et Miller(1981)]. Le théorème suivant précise la structure de l’espace linéaire des v.a.r. SαS. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.5 Mesure de Dépendance des v.a.r. α-Stables 41 Théorème 2.8. – Pour tout 0 < α ≤ 2, la distance dα définit une métrique sur l’espace l(X(t), t ∈ T ). – Particulièrement pour 1 ≤ α ≤ 2, k.kα est une norme définie sur l’espace l(X(t), t ∈ T ). Preuve Il faut et il suffit de vérifier les trois axiomes d’une norme. 1. Soit X une v.a.r. SαS, alors kXkα = 0 ⇐⇒ γX = 0 =⇒ ϕX (t) = 1 =⇒ X = 0p.s. 2. Soit λ un scalaire réel et X une v.a.r. SαS de dispersion γ. D’aprés la proposition 2.8, la dispersion de λX est |λ|α γ. On a donc 1 1 kλXkα = (|λ|α γ) α = |λ|γ α = |λ|kXkα 3. Si X1 et X2 sont deux v.a.r. conjointement SαS de mesure spectrale µ, alors 1 kX1 + X2 kα = γXα 1 +X2 µZ ¶1 α α = |x1 + x2 | µ(dS) S µZ ¶1 α ≤ S |x1 | µ(dS) 1 α α µZ ¶1 α |x2 | µ(dS) α + S 1 α = γX1 + γX2 = kX1 kα + kX2 kα . Alors k.kα définit bien une norme sur l’espace vectoriel des vecteurs SαS. ¥ La difficulté fondamentale en traitement des signaux alpha-stables par les statistiques fractionnaires d’ordre inférieur est que la théorie des espaces d’Hilbert n’est pas valide dans ce cas : l’espace linéaire des processus alpha-stables est un espace de Banach pour 1 ≤ α ≤ 2 mais seulement un espace métrique pour 0 < α < 1. 2.5.3 Coefficient de covariation Dans cette partie, (X, Y ) est un vecteur SαS avec α > 1. Définition 2.12. Le coefficient de covariation de X sur Y est défini par def λX,Y = [X, Y ]α [Y, Y ]α (2.20) où [X, Y ]α est la covariation entre X et Y . M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 42 Distributions Non-Gaussiennes à Queues Lourdes Ces définitions de covariation et de coefficient de covariation ne sont pas très faciles à utiliser en pratique puisqu’elles utilisent la mesure spectrale. Heureusement, on peut connecter la covariation et le coefficient de covariation avec les moments fractionnaires d’ordre strictement inférieur à α. Théorème 2.9. Soient X et Y deux v.a.r. conjointement SαS avec 1 < α ≤ 2. Notons la dispersion de Y par γY , alors – Covariation : [X, Y ]α = IE(XY <p−1> ) γY ; 1 ≤ p < α IE(| Y |p ) (2.21) – Coefficient de covariation : λX,Y = IE(XY <p−1> ) ; 1≤p<α IE(| Y |p ) (2.22) Les moments fractionnaires d’ordre inférieur dépendent de la loi α-stable qui dépend directement de α. Cela implique que la covariation et le coefficient de covariation dépendent de α. Proposition 2.17. Soit X une v.a.r. α-stable d’exposant caractéristique α et de dispersion γ. Alors la dispersion de X peut être exprimé sous la forme Z γX = |x|α dµS 1 (x) = [X, X]α (2.23) S1 Preuve Il suffit de combiner les deux équations (2.16) et (2.21). ¥ Proposition 2.18. 1. Soit (X, Y ) un vecteur SαS, alors on a a λaX,bY = λX,Y b pour tout couple (a, b) ∈ IR × IR∗ 2. Soit (X, Y, Z) un vecteur SαS, alors on a λX+Y,Z = λX,Z + λY,Z 3. Le coefficient de covariation entre X et Y n’est pas symétrique et n’est pas borné. Preuve 1. D’aprés la définition du coefficient de covariation, on a λaX,bY = = [aX, bY ]α ab<α−1> [X, Y ]α = [bY, bY ]α | b |α [Y, Y ]α a [X, Y ]α a = λX,Y . b [Y, Y ]α b M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.5 Mesure de Dépendance des v.a.r. α-Stables 43 2. D’aprés la linéarité à gauche de λX,Y on peut écrire λX+Y,Z [X + Y, Z]α [X, Z]α + [Y, Z]α = [Z, Z]α [Z, Z]α [X, Z]α [Y, Z]α = + [Z, Z]α [Z, Z]α = λX,Z + λY,Z . = 3. Il suffit de prendre X = cY ; avec c 6= ±1 et voir que λX,Y = c et que λY,X = 1/c. Alors, λX,Y 6= λY,X . Par le même exemple on peut conclure que le coefficient de covariation λX,Y = c n’est pas borné. ¥ 2.5.4 Codifférence Nous supposons dans cette partie que 0 < α ≤ 2. Comme le coefficient de covariation, la codifférence est une autre quantité qui permet de mesurer la dépendance entre deux v.a.r. SαS. Définition 2.13. La codifférence entre X et Y est définie par τX,Y = kXkαα + kY kαα − kX − Y kαα (2.24) où k.kα est la norme de covariation introduite précédemment. Proposition 2.19. 1. La codifférence est symétrique : τX,Y = τY,X . 2. Si α = 2, comme le coefficient de covariation, la codifférence est liée à la covariance : τX,Y = Cov(X, Y ) Preuve 1. Pour montrer la symétrie de la codifférence, il suffit de montrer que kX − Y kαα = kY − Xkαα . Or k.kαα est une norme et donc pour toute v.a.r. X SαS, on a kXkαα = k − Xkαα ce qui achève la preuve. 2. On a vu que [X, Y ]2 = 12 Cov(X, Y ) et que kXkαα = [X, X]α . Ce qui entraı̂ne que 1 1 kXk22 = Cov(X, X) = V ar(X) 2 2 et donc 1 1 1 τX,Y = V ar(X) + V ar(Y ) − V ar(X − Y ) 2 2 2 or V ar(X − Y ) = V ar(X) + V ar(Y ) − 2Cov(X, Y ), ce qui donne le résultat souhaité soit τX,Y = Cov(X, Y ). ¥ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 44 Distributions Non-Gaussiennes à Queues Lourdes 2.5.5 Coefficient de covariation symétrique Définition 2.14. (Garel et al., 2004) Soit (X, Y ) un couple aléatoire réel SαS. Le coefficient de covariation symétrique entre X et Y est définit par Corrα (X, Y ) = λX,Y λY,X = [X, Y ]α [Y, X]α [X, X]α [Y, Y ]α On obtient alors le résultat suivant qui corrige les inconvénients du coefficient de covariation. Proposition 2.20. Soit (X, Y ) un couple aléatoire réel SαS. Nous avons les propriétés suivantes 1. Corrα (X, Y ) = Corrα (Y, X) et |Corrα (X, Y )| ≤ 1. 2. Si X et Y sont deux v.a.r. SαS indépendantes, alors Corrα (X, Y ) = 0. 2.5.6 Estimation des coefficients de covariation Proposition 2.21. (Samorodnitsky et Taqqu, 1994 ; d’Estampes, 2003) Soit (X, Y ) un couple aléatoire réel SαS où α > 1, nous avons pour tout 1 ≤ p < α, λX,Y = [X, Y ]α IEXY <p−1> = [Y, Y ]α IE|Y |p . Soit (X1 , ..., Xn ) (resp. (Y1 , ..., Yn )) un n-échantillon de même loi que X (resp. Y ). En prenant p = 1 dans on peut construire un estimateur Pl’équation précédente, P de λX,Y à savoir λ̂X,Y = ni=1 Xi sign(Yi )/ ni=1 |Yi | . Pour estimer Corrα (X, Y ), nous utilisons alors la quantité suivante ¶ µ Pn ¶ µ Pn Y sign(X ) X sign(Y ) i i i i i=1 i=1 ˆ α (X, Y ) = Pn Pn Corr i=1 |Yi | i=1 |Xi | qui est le produit de l’estimateur du coefficient de covariation λX,Y par l’estimateur du coefficient de covariation λY,X . 2.6 2.6.1 Représentation Analytique des PDF α-Stables Développement en séries entières A l’exception des trois lois particulières, lois de Gauss, loi de Cauchy et la loi de Lévy, la PDF des distributions α-stables n’a pas d’expression analytique exacte. Cependant, il existe un développement en série entière de celle ci. Par exemple, le développement en série entière de la PDF d’une distribution α-stable standard SαS, est donné par [Samorodnitsky et Taqqu(1994)] : ∞ X (−1)k−1 sin(kαπ/2) 1 Γ(αk + 1) si 0 < α < 1 π k! x | x |αk k=1 f (x)α = (2.25) ∞ X (−1)k 2k + 1 2k 1 Γ( )x , si 1 ≤ α ≤ 2 πα 2k! α k=0 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.6 Représentation Analytique des PDF α-Stables 45 Vu que ces sommes regroupent un nombre infini de termes, il est difficile de les utiliser dans la pratique. 2.6.2 Développement asymptotique Pour les distributions SαS avec α > 1, il existe un développement asymptotique de la densité de probabilité proposé dans [Bergstrom(1952)]. n f (x)α = 1 X (−1)k 2k + 1 2k Γ( )x + O(| x |2n+1 ) quand | x |→ 0 πα 2k! α (2.26) k=0 et n X 1 (−1)k sin(kαπ/2) Γ(αk +1) +O(| x |−α(n+1)−1 ), quand | x |→ ∞ π k! | x |αk+1 k=1 (2.27) Le calcul de la série asymptotique pour des larges valeurs de n pose des problèmes de calcul au niveau de la fonction gamma. Ces difficultés peuvent être réduites en suivant la procédure proposée dans [Nikias et Shao(1995), page 17]. f (x)α = 2.6.3 − Approximation par un mélange fini [A]- Approximation par un mélange fini de gaussiennes Dans cette section, nous considérons une v.a.r. gaussienne X, une v.a.r. Y α1 stable et la v.a. Z = Y 2 X de loi α-stable selon le corollaire 2.1 et on présente la méthode d’approximation des PDF SαS par un mélange de gaussiennes introduite par [Kuruoglu(1998)]. On peut déduire l’expression de la densité de Z par la propriété de marginalization des PDF Z fZ (z) = +∞ fZ|V (z | v)fV (v)J(z, v)dv −∞ (2.28) 1 où fZ (.) et fV (.) représentent les densités de Z et de V = Y 2 respectivement et J(z, v) représente le Jacobien de Z par rapport à V . Or X est une v.a.r. gaussienne, alors pour une réalisation V = v, fZ|V est conditionnellement distribuée selon la loi Gaussienne. On peut alors réexprimer l’équation (2.28) sous la forme 1 fZ (z) = √ 2π Z +∞ exp(− −∞ z2 )fV (v)v −1 dv 2γv 2 (2.29) Cette densité est appelé mélange d’échelles de la loi normale et la fonction h(v) = fV (v) est dite fonction de mélange. La fonction de mélange est la densité du v.a.r. 1 V = Y 2 dont l’expression est obtenue grâce à la formule suivante M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 46 Distributions Non-Gaussiennes à Queues Lourdes Théorème 2.10. Soit V = T (Y ), où T représente une transformation inversible, alors fV (v) = fY (T −1 (v)) | ou simplement = fY (y) | dT −1 (v) | dv dy | dv (2.30) (2.31) Pour le cas spécial que nous avons considéré ici, cette relation est réduite à fV (v) = fY (y)2v. (2.32) Notons que la décomposition en mélange d’échelle gaussiennes est une propriété bien étudiée dans la littérature [Andrews(1974)]. L’échantillonnage de fZ (z) de l’équation (2.28) sur un ensemble fini de N points permet d’obtenir une approximation de la PDF SαS par un mélange fini de densités gaussiennes : PN f(α,a,0,γ) (z) ≈ 2 exp(− (z−a) )fV (vj ) 2γvj2 P √ 2πγ N j=1 fV (vj ) 1 j=1 vj (2.33) Pour une bonne approximation, on doit prendre N assez grand ce qui va rendre le calcul assez complexe. Pour réduire cette complexité, [Kuruoglu(1998)] propose d’utiliser un certain nombre de composantes et puis de raffiner l’approximation en utilisant l’algorithme EM [Dempster(1977)]. Cette procédure permet alors d’estimer la densité SαS, nous résumons les étapes essentielles dans le tableau 2.5 suivant. [B]- Approximation par un mélange fini de Pearson Pour le cas d’une densité α-stable de paramètres β = +1 et α < 1, une approximation par un mélange fini de densités de Pearson qui sont des PDF α-stables d’indice α = 1/2 été proposé récemment dans [Kuruoglu(2003)]. Notons que l’auteur suit la même démarche pour le cas de mélange de gaussiennes ci-dessus en prenant αx = 1/2 au lieu de αx = 2 dans le théorème des mélange d’échelles des lois α-stables (théorème 2.5). 2.7 Autres Distributions à Queues Lourdes Dans cette section, nous introduisons d’autres classes de distributions à queues lourdes. La première est celle des lois gaussiennes généralisées (GG) et la deuxième classe est celle des lois appeleées lois normales inverse gaussiennes (NIG). 2.7.1 Loi gaussienne généralisée Une généralisation des lois de Gauss et de Laplace est donnée par le modèle des lois gaussiennes généralisées. La distribution de ce modèle est décrite par une M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.7 Autres Distributions à Queues Lourdes 47 $ ' Estimation de la PDF SαS Step 1. Initialisation : Étant donné les paramètres de la PDF SαS désirée, on génère la fonction caractéristique ϕY (.) d’une´ v.a.r. Y stable, ³ 2 )) α positive de paramètres α/2, β = −1, a = 0, γ = (cos( πα 4 Step 2. Evaluation de la PDF stable positive fY en N points : En appliquant la FFTa à la fonction caractéristique ϕY (.) générée dans l’étape précédente, où N représente le nombre de composantes gaussiennes dans le mélange Step 3. Evaluation de la fonction de mélange fV : c’est la densité de 1 la v.a.r. V = Y 2 donnée par fV (v) = 2vfY (v 2 ) (2.34) Step 4. L’approximation analytique de la PDF SαS : Par substitution de l’équation (2.34) dans l’équation (2.33) : PN f(α,a,0,γ) (z) = 2 )fY (vj2 ) exp(− (z−a) 2γvj2 P √ 2 2πγ N j=1 vj fY (vj ) j=1 (2.35) Step 5. Affinage de l’approximation par l’algorithme EM : Nous cherchons à estimer un mélange de gaussiennes de la forme f(α,a,0,γ) (z) = N X pj G(z/j) (2.36) j=1 P où pj sont les fréquences de pondération tel que N i=1 pj = 1 et G(z/j) sont des PDF gausiennes. On considère M observations (zm , m = 1, · · · , M ) comme variables cachées et on applique l’algorithme EM qui consiste à initialiser l’algorithme par une première estimation de G(zm /j) et pj et puis alterner les deux étapes ”Expectation” et ”Maximisation” [Dempster(1977)]. a La transformée de Fourier rapide & % Tab. 2.5 – Approximation de la PDF SαS par le modèle de mélange de gaussiennes et affinage de l’approximation par l’algorithme EM. densité de type exponentielle de la forme : fp (x) = c exp(− | x α | ) σ (2.37) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 48 Distributions Non-Gaussiennes à Queues Lourdes où c = α 1 2σΓ( α ) et Γ(.) est la fonction gamma. Le paramètre σ > 0 représente le paramètre d’échelle de la distribution et α > 0 est le paramètre qui caractérise l’impulsivité. Notons que pour α = 2, fα (x) est gaussienne, alors que α = 1 correspond à la loi de Laplace. Conceptuellement, plus α est petit, plus la distribution est impulsive. Cette classe de PDF est utilisée depuis longtemps. Les références les plus anciennes à ma connaissance sont [Subbotin(1923)] et [Frechet(1924)]. En raison de leur simplicité dans les calculs mathématiques elle sont largement exploitées dans les applications du traitement du signal [Kay(1998a)] pour modéliser plusieurs processus, qui sont observés dans des domaines variés dont le traitement de la parole, l’audio ou le signal vidéo, l’image, la turbulence et les systèmes multi-utilisateurs [Zoubir et Brcich(2002)]. Notons que les moments de ce type de v.a. sont finis et calculables analytiquement, par opposition à d’autres PDF à queues lourdes comme les α-stables presentées au début de ce chapitre. Proposition 2.22. L’expression analytique des moments d’ordre k est donnée par : ½ 0 si k est impair, k (2.38) IE(X ) = 2c k+1 ) si k est pair. Γ( α ασ −k−1 Preuve – Si k est impair, IE(X k ) = 0, car la fonction xk exp(− | impaire. – Si k est pair, on a : Z k +∞ x σ |α ) est x α | )dx σ −∞ Z +∞ x = 2 cxk exp(−( )α )dx σ 0 Z +∞ k+1 2c = y α −1 exp(−y)dy −k−1 ασ 0 2c k+1 = Γ( ). ασ −k−1 α IE(X ) = cxk exp(− | ¥ dans la proposition suivante, nous présentons le comportement de la loi gaussienne généralisée pour différentes valeurs de α. Proposition 2.23. (Comportement de la loi GG pour différentes α) – Si α = 2, la loi gaussienne généralisée correspond à la loi de Gauss standard. – Si α > 2, la queue de la loi gaussienne généralisée est moins lourde que celle de la loi de Gauss standard, c’est-à-dire que la PDF tend vers 0 plus rapidement que la PDF de Gauss. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.7 Autres Distributions à Queues Lourdes 49 – Si α tend vers +∞, la loi gaussienne généralisée converge vers la loi uniforme. – Si 0 < α < 2, la queue de la loi gaussienne généralisée est de nature impulsive. Malgré le succés relatif de la famille des lois gaussienne généralisées, elle présente quelques limitations. En effet, lorsque α < 1 la ”peaky shape” de la distribution est non approprié à certaines situations pratiques du bruit. De plus, la minimisation de la fonction de coût basée sur la norme Lp reste un problème majeur. On peut noter également la décroissance exponentielle de la queue contrairement au comportement algébrique de la queue des processus impulsifs rencontrés dans plusieurs applications [Nikias et Shao(1995)]. 2.7.2 Loi normale inverse gaussienne La famille des lois normales inverses gaussiennes (NIG) est une sous classe universelle des distributions hyperboliques généralisées. Le travail pionnier sur les lois N IG est introduit par Barndorff-Nielsen en 1977 et en 1995. D’autres réferences récentes existent dans la littérature comme par exemple [Barndorff(1998)]. Contrairement au lois SαS, la densité des distributions N IG a une expression explicite. [A]- Définition Définition 2.15. Une v.a.r. X est de loi N IG si sa densité de probabilité est de le forme √ p K (α αδ δ 2 + x2 ) 1 exp(δ α2 − β 2 − βµ) √ exp(βx) fX (x) = π δ 2 + x2 (2.39) où µ, δ > 0, 0 ≤| β |≤ α ∈ IR et K1 est la fonction de Bessel modifiée de seconde espèce d’indice 1. Une v.a.r. X de loi N IG est notée X ∼ N IG(α, β, δ, µ). Une loi N IG est paramétrée par quatres paramètres α, β, δ et µ et ces paramètres ont la même interprétation que ceux des lois α-stables : α détermine le comportement de la queue, plus α est petit plus la queue est lourde ; β est un paramètre de symétrie, β = 0 donne une densité symétrique, β > 0 implique que la densité est étalée vers la droite, β < 0 implique que la densité est étalée vers la gauche ; δ est un paramètre d’échelle et µ est un paramètre de position. Pour illustrer l’allure des lois N IG nous avons tracé la PDF pour plusieurs valeurs de α dans la figure 2.4. Définition 2.16. La fonction caractéristique d’une v.a.r. N IG est donnée par [Barndorff-Nielsen(1997)]. n p o p ϕX (t) = exp δ α2 − β 2 − δ α2 − (β + jt)2 + jµt (2.40) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 50 Distributions Non-Gaussiennes à Queues Lourdes 0.7 α=2 α=1 α=0.5 α=0.001 0.6 NIG(α,0,1,0)(x) 0.5 0.4 0.3 0.2 0.1 0 −5 −4 −3 −2 −1 0 x 1 2 3 4 5 Fig. 2.4: La densité de probabilité de la loi N IG(α, 0, 1, 0) pour différent valeurs de α. [B]- Propriétés – Indéfiniment divisible : D’aprés la forme exponentielle de la fonction caractéristique des lois NIG, on peut l’exprimer facilement sous forme d’une puissance d’une autre fonction caractéristique d’une loi N IG, par conséquent les lois N IG sont des lois indéfiniment divisible. Cette proprièté signifie que si X1 , · · · , XN sont des v.a.r. indépendantes et si Xi ∼ N IG(α, β, δi , µi ), PN alors la somme S = est aussi de loi N IG. En plus nous avons i=1 Xi P PN S ∼ N IG(α, β, δ, µ) avec δ = N i=1 δi et µ = i=1 µi . Cette propriété est similaire à la proprièté qui caractérise la classe des α-stable. Cependant, la distribution N IG n’est pas stable. Une façon de le voir, est que le paramètre δ diverge pour une somme infinie normalisée de v.a.r. N IG. – Contient les lois de Gauss et de Cauchy : D’après la forme de la fonction caractéristique on peut remarquer facilement que les lois de Cauchy et de Gauss apparaı̂tront comme des cas spéciaux des lois N IG. En effet, la loi de Gauss représente le cas limite β = 0, α → ∞ et σ 2 = δ/α ; et N IG(0, 0, δ, µ) correspond à la loi de Cauchy. – Comportement asymptotique de la queue : Dans [Hanssen et Oigard(2001)], l’auteur a démontré que le comportement asymptotique de la PDF N IG est M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 2.7 Autres Distributions à Queues Lourdes 51 donné par ½ lim f (x) ∝ |x|→∞ | x |−3/2 exp(βx − α | x |) , α 6= 0 | x |−2 , α → 0. (2.41) On voit alors que pour α 6= 0, le comportement asymptotique des PDF N IG combine une décroissance algébrique et exponentielle dont le terme exponentiel est déterminé par les deux paramètres α et β. Quand α → 0, la PDF N IG approche de celle de Cauchy et donc le comportement asymptotique de la queue aussi approche de la queue de Cauchy. 2.7.3 Loi t-Student Définition 2.17. ( William Sealy Gosset 1908) Introduite par Gosset en 1908, la PDF d’une distribution t-Student est paramétrisée sous la forme µ ¶− α+1 2 x2 Tα = c 1 + (2.42) α où 1 Γ( α+1 ) c= √ √ 2α α πΓ( 2 ) Il s’agit d’une densité symétrique par rapport à l’axe des ordonnées. Définition 2.18. (Roland Aylmer Fisher 1925) Fisher s’interessa aux travaux de Gosset. Il lui écrivait en 1912 pour lui proposer une démonstration géométrique de la loi de Student, et pour introduire la notion de degré de liberté. Il publia notamment un article en 1925 dans lequel il définit la loi de student comme rapport de deux v.a.r. indépendantes U et Y suivant respectivement une loi N (0, 1) et une loi χ2 (α) : def U Tα = q Y α = √ U α√ Y (2.43) On dit que le quotient Tα suit une loi de t-Student (ou tout simplement : loi de Student)3 à α degré de liberté. Proposition 2.24. (Propriétés de la loi de Student) – L’espérance : De l’expression de la PDF ci-dessus, on peut déduire qu’une v.a.r. de loi t-Student est centrée et de moyenne nulle. – La variance : Lorsque α ≤ 2, la loi de Student n’admet pas de variance finie. α Si α > 2, le calcul de la variance donne α−2 . 3 Student était le pseudonyme choisi par le statisticien William Sealy Gosset (18761937). Il fut l’un des premiers statisticiens du monde de l’entreprise, consacrant sa carière à l’industrie agro-alimentaire, au sein de laquelle il a toujours été reconnu à la fois comme industriel et comme scientifique. Très associé au monde universitaire, il a largement contribué au développement scientifique de cette période. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 52 Distributions Non-Gaussiennes à Queues Lourdes – Queue algébrique : Il est facile de constater, d’après la forme de la PDF, que la loi de Student est de queue algébrique d’indice α. Plus α devient petit, plus la queue devient lourde. – Cas extrêmes : 1. Lorsque α → ∞, la distribution de Student est équivalente à la distribution de Gauss. 2. Lorsque α → 0, la distribution devient très impulsive. – Cas particulier : Lorsque α = 1, le modèle correspond à celui de Cauchy. La famille des lois t-Student était introduite en traitement du signal pour la première fois par Hall en 1966 comme modèle empirique pour le bruit atmosphérique en communication radio [Hall(1966)]. Néanmoins, on trouve bien avant ce modèle dans la littérature des statistiques mathématiques indexé par un entier k au lieu du réel α. Il se peut que Hall l’avait généralisé en remplaçant k par α. 2.8 Conclusion Malgré le rôle important des lois α-stables dans la modélisation des signaux à densité de probabilité à queue lourde, elles présentent des limites : préférons dire, elles ouvrent plusieurs questions pour surmonter les difficultés rencontrées lors de l’inférence statistique en l’absence d’une expression explicite de la densité et en l’absence des moments de second ordre et d’ordre supérieur. Pour contribuer à la résolution de certaines questions relatives à la séparation de sources impulsives de distributions α-stables et d’estimation d’un signal noyé dans un bruit impulsif de modèle α-stable, nous allons proposer dans les chapitres suivants de nouvelles approches. En effet, nous allons utiliser les moments d’ordre inférieur, introduire des statistiques normalisées et approcher la densité de probabilité par la famille des fonctions log-splines pour pouvoir manipuler les observations de lois α-stables. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 3 Robust Estimation The robust minimax approach is an alternative to the conventional maximum likelihood (ML) that overcomes the ML estimate sensitivity and improves the efficiency in an environment with unknown heavy-tailed distribution [Huber(1981)]. In this chapter, we provide a brief background on the fundamental concepts of robust estimation that will be used in the subsequent 2-nd part of this thesis. 3.1 Robustness The term ”robust” was coined in statistics by G.E.P. Box in 1953. Various definitions of greater or lesser mathematical rigor are possible for the term. However, in general, referring to a statistical estimator, it means ”insensitive to small departures from the idealized assumptions for which the estimator is optimized” [Hampel et al.(1986), Huber(1972)]. The word ”small” can have two different interpretations, both important : either fractionally small departures for all data points, or else fractionally large departures for a small number of data points. It is the latter interpretation, leading to the notion of outlier points, that is generally the most stressful for statistical procedures. Roughly speaking, robustness means insensitivity to gross measurement errors, and errors in the specification of parametric models. For example, consider the estimation of the mean from 100 measurements. Assume that all measurements (but one) are distributed between -1 and 1, while one of the measurements has the value 1000. Using the simple estimator of the mean given by the sample average, the estimator gives a value that is not far from the value 10. Thus, the single, probably erroneous, measurement of 1000 had a very strong influence on the estimator. The problem here that the average corresponds to minimization of the 54 Robust Estimation squared distance of measurements from the estimate. The square function implies that measurements far away dominate. To get a good estimator in presence of outliers, statisticians have developed various sorts of robust statistical estimators. Many, if not most, can be grouped in one of three categories. • M-estimates : follow from maximum-likelihood arguments which are the usually the most relevant class for model-fitting, that is, estimation of parameters. We therefore consider these estimates in some detail below. • L-estimates : are linear combinations of order statistics. These are most applicable to estimation of central value and central tendency. Two typical L-estimates will give the general idea. They are (i) the median, and (ii) Tukey’s trimean, defined as the weighted average of the first, second, and third quartile points in a distribution, with weights 1/4, 1/2, and 1/4, respectively. • R-estimates : are estimates based on rank tests. For example, the equality or inequality of two distributions can be estimated by the Wilcoxon test of computing the mean rank of one distribution in a combined sample of both distributions. The Kolmogorov-Smirnov statistic and the Spearman rank-order correlation coefficient are R-estimates in essence, if not always by formal definition [Huber(1981)]. Some other kinds of robust techniques, can be found in the field of optimal control and filtering rather than from mathematical statistics literature. 3.2 M- Estimation Huber (1964) proposes a generalization of the least squares principle for constructing estimators of (principally) location parameters. Suppose, on the basic model, that the sample comes from a distribution with distribution function F (x − θ). It is the location parameter, θ, which we wish to estimate. We might estimate θ by Tn = Tn (x1 , x2 , . . . , xn ) chosen to minimize n X ρ(xj − Tn ) (3.1) j=1 where ρ is some real valued non-constant function. As special cases we note that ρ(t) = t2 yields the sample mean, ρ(t) = |t| yields the sample median, whilst ρ(t) = − log f (t) yields the maximum likelihood estimator (where f (x) is the density function under the basic model when θ = 0). If ρ is continuous with derivative ψ, equivalently we estimate θ by Tn satisfying n X ψ(xj − Tn ) = 0. (3.2) j=1 Such an estimator is called a maximum likelihood type estimator, or M-estimator. If ρ is convex, then ( 3.1) and (3.2) are equivalent ; otherwise, (3.2) is still very M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 3.2 M- Estimation 55 useful in searching for the solution to (3.1). Usually we restrict attention to convex ρ, so that ψ is monotone and Tn unique. Under quite general conditions Tn can be shown to have desirable properties as an estimator. If ρ is convex Tn is unique, translation invariant, consistent, and asymptotically normal [Huber(1972)]. The choice of ρ that leads to an optimal robust estimator of θ is now discussed. One particular estimator with desirable properties of robustness arises from the Huber function ½ ρH (t) = 1 2 2t k|t| − 21 k 2 if | t |≤ k, if | t |> k (3.3) for a suitable choice of k. It turns out that the estimator Tn is equivalent to the sample mean of a sample in which all observations xj such that | xj − Tn |> k are replaced by Tn − k or Tn + k, whichever is the closer. Another M-estimator, with ½ ρ(t) = 1 2 2t 1 2 2η if | t |≤ η, if | t |> η (3.4) can be similarly interpreted as a trimmed mean. Tn is now the sample mean of those observations xj satisfying | xj −Tn |< η. This extends the modified trimming above from rejection of a single extreme value to rejection of all sample values whose residuals about Tn are sufficiently large in absolute value. See [Huber(1981)] for details. Standard cost functions – Normal criterion : For a gaussian distribution, the ML estimation leads to ρ(x) = x2 ; 2 ψ(x) = x (3.5) N N X X 4 1 (xi − θ) = 0 yields θ̂ = x̄ = xi N i=1 i=1 – Double exponential criterion : For a modulus distributions, the score function is given by ½ ρ(x) = |x|; N X ψ(x) = −1 x < 0 1 x>0 (3.6) ψ(xi − θ) = 0 yields θ̂ = sample median i=1 – Maximum likelihood criterion : The choice of ρ(x) = − log f (x) (where f represent the observation PDF) gives the ordinary maximum likelihood estimate. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 56 Robust Estimation When the basic model involves a scale parameter, the distribution function is of the form F [(x − θ)/σ]), modified forms of the M-estimator have been proposed. The estimator of θ is a solution Tn of an equation of the type n X ψ[(xj − Tn )/σ̂] = 0 (3.7) j=1 where the scale parameter estimator σ̂ is robust for σ and is estimated either independently by some suitable scheme or simultaneously with θ by joint solution of ( 3.7) 3.2.1 Minimax M-estimate of location estimator In this section, we consider a robust estimation in a minimax sense based on Huber’s minimax M-estimator [Huber(1981)]. Huber considered the robust location estimator problem. Suppose we have one-dimensional (1-D) i.i.d. observations x1 , x2 , . . . , xn . The observations belong to some sample space X , which is a subset of the real line IR. A parametric model consist of a family of probability distributions Fθ (or equivalently a family of PDF fθ ) on the sample space, where the unknown parameter θ belongs to some parameter space θ. When estimating location in the model X = IR, Fθ (x) = F (x − θ), the M-estimator is determined by a ψ− function of the type ψ(x, θ) = ψ(x − θ), i.e., the M-estimate of the location parameter θ is given by the solution to the equation n X ψ(xi − θ) = 0. (3.8) i=1 Assuming that the sample distribution belongs to the set of ²- contaminated Gaussian models given by : P² = {(1 − ²)N (0, ν 2 ) + ²H; H is a symmetric distribution} (3.9) where 0 < ² < 1 is fixed, and ν 2 is the variance of the nominal Gaussian distribution. It can be shown that, within mild regularity, the asymptotic variance of an M-estimator of the location θ defined by (3.8) at a distribution F ∈ P² is given by [Huber(1981)] R 2 ψ dF V (ψ; F ) = R 0 (3.10) ( ψ dF )2 Huber’s idea was to minimize the maximal asymptotic variance over P² , that is, to find an M-estimator ψ0 that satisfies sup V (ψ0 ; F ) = inf sup V (ψ; F ). F ∈P² ψ F ∈P² (3.11) This is achieved by finding the least favorable distribution F0 , i.e., the distribution that minimizes the Fisher information M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 3.2 M- Estimation 57 Z à I(F ) = 00 F F0 !2 dF (3.12) 00 over all F ∈ P² . Then, ψ0 = − FF 0 is the maximum likelihood estimator for this least favorable distribution. Using the above concepts of minimax robustness, Huber showed that the Fisher information is minimized by ( f0 (x) = √1−² 2πν √1−² 2πν 2 x exp( 2ν 2 ), for | x |≤ kν 2 2 2 exp( k 2ν − k | x |), for | x |> kν 2 (3.13) where k, ², and ν are connected through φ(kν) ² − Q(kν) = kν 2(1 − ²) where (3.14) x2 1 def φ(x) = √ e− 2 2π and Z ∞ x2 1 e− 2 dx. Q(t) = √ 2π t The corresponding minimax M-estimator is then determined by the Huber penalty function and its derivative, which is given by ( 2 ½ x sign(x)K, |x| > K , |x| ≤ K 2 ; ψH (x) = (3.15) ρH (x) = K2 x, |x| ≤ K. K|x| − 2 |x| > K def N X ψH (xi − θ) = 0 is solved by numerical methods i=1 These are the ρ and ψ functions associated with a function which is “normal” in the middle with “double exponential” tails. The constant K regulates the degree of robustness ; good choices for K are between 1 or 2 times the standard deviation of the observations. The corresponding M-estimator is the minimax solution. 3.2.2 Influence Function The influence function (IF) introduced in [Hampel et al.(1986)], is an important tool used to study robust estimators. It measures the influence of a vanishingly small contamination of the underlying distribution on the estimator. It is assumed that the estimator can be defined as a functional T operating on the empirical distribution function Fn , T = T (Fn ) and that the estimator is consistent as n → +∞, i.e., T (F ) = limn→+∞ T (Fn ), where F is the underlying distribution. The influence function is defined as M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 58 Robust Estimation T [(1 − t)F + t∆x ] − T (F ) t→0 t IF (x; T, F ) = lim (3.16) where ∆x is the distribution that puts a unit mass at x. Roughly speaking, the influence function IF (x; T, F ) is the first derivative of the statistic T at an underlying distribution F and at the coordinate x. The influence function measures the effect of a deviation from the assumed distribution on a descriptive statistics, T , in other words robustness. The utility of the influence function is that allows us to calculate the asymptotic covariance of the M-estimates using the formula [Huber(1981)], Z Cov {T (Fn ), T (F )} = IF (x; T, F )IF (x; T, F )T dF (3.17) One can proceed to calculate IF (x; T, F ) and Cov {T (Fn ), T (F )} for the given signal model. 3.2.3 M-Estimation of a deterministic signal parameter Consider the general signal in noise model given in (12.1) : x(t) = s(t, θ) + z(t) where z(t) is i.i.d. noise and the signal, s(t), is parameterized by θ = (θ1 , · · · , θM )T , (.)T denoting transposition. The aim is to estimate θ from N observations x(t), t = 1, · · · , N . Given the noise density, f (z), one obtains the ML solution as θ̂ = arg min θ N X ρ{x(t) − s(t, θ)} (3.18) n=1 where ρ(x) = − log f (x). Alternatively, we can solve the M coupled equations N X ψ{x(t) − s(t, θ)} n=1 ∂s(t, θ) =0 ∂θ (3.19) where ψ(x) = −f 0 (x)/f (x) is the location score function of f (x). It is clear that without a priori knowledge of f (x), the estimation of θ cannot be optimal. Huber considered estimation in the presence of outliers or impulsive noise and proposed the concept of M-estimation [Huber(1981)]. In an M-estimation framework − log f (x) is replaced with a similarly behaved function, ρ(x), chosen to confer robustness on the estimator under deviations from a nominal density. Thus, a MEstimate for θ can be obtained as a solution of the optimization problem given in equation (3.18) or by solving the M coupled equations N X n=1 ϕ{x(t) − s(t, θ)} ∂s(t, θ) =0 ∂θ (3.20) where ϕ(x) = ρ0 (x). When f (x) is unknown one is unsure of how close ϕ(x) is to ψ(x). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 3.2 M- Estimation 3.2.4 59 Theoretical performance Let F be the distribution of the noise and Fn its empirical counterpart from a sample of size n. Then an estimate of θ can be defined in terms of a functional T operating on Fn , T (Fn ), while the true parameters are obtained as T (F ). Under some mild conditions such as IE[ϕ(x)] = 0, M-estimates possess desirable properties such as consistency and asymptotic normality [Hampel et al.(1986), Huber(1981)]. Herein we assume a symmetric noise density and antisymmetric ϕ to ensure this condition is met. • Asymptotic covariance : Using the influence function concept, it is proved in [Brcich et Zoubir(2002)] that the asymptotic covariance of estimation errors of θ has the form ÃN !−1 E[ϕ2 (x)] X Cov {T (Fn ), T (F )} = Λn ΛTn (3.21) E[ϕ0 (x)]2 n=1 where ϕ(x) = ρ0 (x) and Λn is the gradient of sn (θ). Then the only degree of freedom at our disposal for minimizing the asymptotic covariance is through appropriate choice of ϕ(x). • Asymptotic normality : Define Cov {T (Fn ), T (F )} to be the asymptotic variance then, 1 d→∞ n 2 (T (Fn ) − T (F )) −→ N (0, Cov {T (Fn ), T (F )}) (3.22) • Consistency : Let F belong to a family of distributions F, then T (Fn ) converges in probability to T (F ) as n → ∞, IP {| T (Fn ) − T (F ) |> ²} → 0 as n → ∞, F ∈F (3.23) for any ² > 0. 3.2.5 Minimax optimal cost function Let the noise distribution f be known incompletely ; what is known is only that it belongs to a certain class P. Applying to our M-estimator the Cramer-Rao inequality, under certain regularity assumptions, gives Cov {T (Fn ), T (F )} ≥ A(Λn )I(f )−1 (3.24) where I(f ) is the Fisher information and A(Λn ) is a matrix depending only on Λn . The worst distribution is naturally the one for which the right-hand part in (3.24) is maximal, or I(f ) is minimal. In other words, the robust Huber’s minimax estimator over P is defined as in the ML method by equation (3.18) with the loss function ρ∗ (z) = − ln(f ∗ (z)) (3.25) where f ∗ (z) is selected such that the information on the parameter contained therein is minimal, i.e. a solution of the problem f ∗ (z) = arg min I(f ) f ∈P (3.26) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 60 Robust Estimation R where I(f ) = (f 0 (z))2 /f (z))dz denotes the Fisher information. We call the Mestimator robust if the loss function ρ is given according to (3.26) and (3.25). This approach consists to consider the worst case (among p ∈ P) corresponding to the PDF giving the minimum Fisher information value. Solving the worst case would insure robustness (good estimation performance) if the considered signal pdf belongs to P. It is emphasized that the robustness property of the estimator depends on how the class P is defined. Thus, in order to obtain the robust minimax estimator, first, an appropriate class P should be defined, and after that, the loss function ρ is given by (3.25) and (3.26). 3.3 Concluding Remarks M-estimation is an alternative approach for robust estimation that is used to implement sub-optimal estimators which are robust to changes in the underlying distribution. Since the impulsive noise is present in communications channels, the M-estimation of signal parameters in the additive noise model becomes an important issue. The approach to robust estimation taken in the second part of this thesis follows the M-estimation concept of robust statistics, except for that the density function is modelled as an α-stable PDF and is estimated from the observations. However, many questions remain open for serious discussions such as the choice of the so called score function ϕ in the case of α-stable noise model. The second part of this thesis investigates these difficulties and proposes some solutions in the context of a multicomponent non-stationary FM signal. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 4 Time-Frequency Concepts Time–frequency signal processing (TFSP) represents a set of effective methods, techniques and algorithms used for analysis and processing of non-stationary signals, as found in a wide range of applications including telecommunications, radar and biomedical engineering. TFSP is a natural extension of both the time domain and the frequency domain processing, that involves representing signals in a two– dimensional space, and so reveals “complete” information about the signal. Such a representation is intended to provide a distribution of signal energy versus time and frequency simultaneously. More details and advances of TFSP can be found in [Cohen(1995), Flandrin(1998), Hlawatsch(1998), Boashash(2002)]. This chapter, therefore, provides a brief background on the fundamental concepts of TFSP that will be used in the subsequent 2-nd part. 4.1 Need of Time-Frequency Representation The two classical representations of a signal s(t), are the time-domain representation, and, frequency-domain representation S(f ) = FT {s(t)}, where FT stands for the fourier transform. Each classical representation of the signal s(t) is non localized with respect to the excluded variable. Consequently, such representations are not suitable for signals with time-varying spectral contents (non-stationary signals). For non-stationary signals, an indication as to how the frequency content of the signal changes with time, is needed. The magnitude spectrum (frequency representation) of a signal gives no indication as to how the frequency content of the signal changes with time, an important information when one deals with FM signals. Time–frequency signal processing, being a natural extension of both the time domain and the frequency domain pro- 62 Time–Frequency Concepts cessing, preserves and reveals this information about the signal. TFSP involves, and is intended to provide a distribution of signal energy versus both time and frequency. For this reason, the TF representation is commonly referred to as a TFD [Boashash(2002)]. In order to see the inherent limitations of the classical representations of a nonstationary signal, consider a linear frequency modulated (LFM) signal with length N = 128 and sampling frequency fs = 1 Hz. Its frequency increases linearly from 0.1 to 0.4 Hz. Figure 4.1 shows different representations of this signal. The time representation of the LFM signal gives no indication about the frequency content of the signal, neither does the spectrum of the signal as to how the spectrum of the signal changes with time. This example shows more clearly why classical representations are inadequate for non-stationary signals. 1 0.8 0.6 0.4 x(t) 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0 20 40 60 80 100 120 140 Time [Sec] (a) 700 600 500 |S(f)|2 400 300 200 100 0 0 20 40 60 80 100 120 140 Frequency [Hz] (b) Fig. 4.1: (a) : Time-domain and (b) : frequency-domain representations of an LFM signal. It shows clearly the inherent limitation of classical representations of a non-stationary signal. To overcome the inadequacies of classical representations of a non-stationary signal, which was exposed by the above example, we desire a representation in M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 4.2 Nonstationarity and FM Signals 63 the two dimensional (t, f ) space. Such a representation is called a TFD. As an illustration, Figure 4.1 shows one particular TF representation of the LFM signal in Figure 4.1 using Wigner-Ville distribution (WVD). The representation in Figure 4.1 not only shows the start and stop times and the frequency range of the LFM signal, but also clearly shows the variation in frequency with time. The latter feature, which shows at a glance the frequency at a given time or the time at which a given frequency is present, is missing from the conventional signal representations in Figure 4.1. The use of a TFD for a particular signal inevitably depends on the nature of the signal (whether it is mono- or multi-component), and, the properties that the TFD is expected to satisfy. A set of the properties a TFD needs to satisfy, is reported in [?]. In [Boashash et Sucic(2003)], Boashash et al. give a subset of those properties which are more important in most practical applications. Fs=1Hz N=128 Time−res=1 120 Time (seconds) 100 80 60 40 20 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 0.5 Fig. 4.2: A TF representation of the LFM signal in 4.1. 4.2 Nonstationarity and FM Signals We recall now some important definitions. Définition 4.1 (Analytic signal [Boashash(2002)]). Let s(t) be a real FM signal of the general form : s(t) = A(t) · cos[θ(t)], (4.1) with the assumption that the spectra of the amplitude A(t) and phase θ(t) are separated (nonoverlapped) in frequency, i.e. the signal approaches a narrowband condition [Boashash(1992a)]. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 64 Time–Frequency Concepts Let H[·] denote the Hilbert transform of the signal, such that H[s(t)] def = s(t) ? = 1 p.v. π 1 πt ½Z ∞ −∞ s(τ ) dτ t−τ ¾ where p.v.{.} is the Cauchy principle value of the improper integral given in this case by ¸ ·Z t−δ Z ∞ s(τ ) s(τ ) dτ + dτ (4.2) lim δ→0 −∞ t − τ t+δ t − τ A signal z(t) defined as ∆ z(t) = s(t) + jH[s(t)] ≈ A(t) ejθ(t) (4.3) is called the analytic signal of the real signal s(t). The approximation is valid for the above narrowband condition. The definition of the analytic signal is important to define the IF of signal s(t). Définition 4.2 (Instantaneous frequency [Boashash(2002)]). Let z(t) be an analytic signal given in the form z(t) = Az (t) ejθz (t) (4.4) The instantaneous frequency of signal z(t) is then defined as ∆ fin (t) = 1 dθz (t) 2π dt (4.5) The IF, fin (t), presents a measure of the localization in time of “that” frequency at time t. In this sense, a signal is said to be nonstationary if its IF varies in time. We can observe in Figure 4.3 the TV behavior of an engineering signal (linear FM signal, used in radar and military applications) and real–life signals (whale song, electroencephalogram signal, bat signal). Note that, Definition 4.2 is applicable to monocomponent signals only, such the signal illustrated in Figure 4.3(a). When more than one “ridge” appears in the signal TF representation, the signal is said to be multicomponent, e.g. the signals in Figure 4.3(b–d). The importance of the IF and its applications is represented by Boashash in [Boashash(1992a), Boashash(1992b), Boashash(1992c)]. The nonstationarity can also be expressed in the common sense of random process as shown in [Boashash et Sucic(2002)]. Let z(t) be a complex signal of which the autocorrelation function is defined as : n τ τ o ∆ Rz (t, τ ) = E z(t + )z ∗ (t − ) (4.6) 2 2 If Rz (t, τ ) only depends on the time–lag τ , which is the difference in time between t1 = t+τ /2 and t2 = t−τ /2, the signal s(t) is said to be wide–sense stationary (we only consider the second–order moment). On the other hand, when this condition is not satisfied, s(t) is said to be nonstationary, the autocorrelation function Rz (t, τ ) depends on both the time and the time–lag. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 4.2 Nonstationarity and FM Signals Fs=1Hz N=128 65 Fs=1Hz N=7000 Time−res=4 WHALE SIGNAL Time−res=120 7000 120 6000 100 5000 Time (secs) Time (secs) 80 60 40 4000 3000 2000 20 1000 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 0.5 0.05 (a) Linear FM signal Fs=20Hz N=600 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 0.5 0.4 0.45 0.5 (b) Whale signal Fs=1Hz N=400 b−Distribution Time−res=5 0.1 BAT SIGNAL Time−res=8 30 400 350 25 300 250 Time (secs) Time (seconds) 20 15 200 150 10 100 5 50 1 2 3 4 5 6 Frequency (Hz) 7 8 9 10 0.05 (c) Electroenphalogram signal 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 (d) Bat signal Fig. 4.3: Examples of nonstationary signals. An engineering application is shown in (a) for a linear FM signal (plotted using the Wigner–Ville distribution). Real–life applications are shown in (b–d) for a whale signal, an electroencephalogram signal, and a bat signal, respectively (all plotted using the B distribution). Définition 4.3 (Linear FM signal [Boashash(2002)]). Consider the typical FM transmission in communications systems, a narrowband FM signal is commonly defined as [Boashash(1992a)] : µ ¶ Z t ∆ s(t) = A(t) · cos 2πfc t + 2π m(τ ) dτ . (4.7) −∞ When m(t) is a linear function of t, i.e. m(t) = αt, signal s(t) is called a linear frequency–modulated (LFM) signal. In addition, if A(t) is a rectangular function the signal is called a “chirp”. A chirp signal, with duration T and bandwidth B, can be expressed as : [Rihaczek(1985)] : schirp (t) = rectT (t) cos[2π(fc t + α 2 t )] 2 The analytic signal associated with schirp (t) is then given by α 2 ) z LFM (t) = rectT (t) ejθ(t) = rectT (t) ej2π(fc t+ 2 t (4.8) and its IF is chirp fin (t) = 1 dθ(t) = fc + αt. 2π dt (4.9) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 66 Time–Frequency Concepts The chirp signal defined in (4.8) is of practical importance. It is the basic signal used in radar applications, and can be easily generated [Rihaczek(1985)]. It is also used in military communication applications where the chirp is sent out as a hostile signal to destroy other communications [Proakis(1995), Milstein(1988), Amin(1997)]. In this thesis, we will refer to the chirp signal as an LFM signal (i.e. the rectangular amplitude is implicit). Based on the above concepts of analytic signals and instantaneous frequency for nonstationary signals, we now see how they evolve around the fundamentals of TFSP. 4.3 The STFT, SPEC, WVD, and Quadratic TFD To study the spectral properties of the signal at time t, an intuitive approach is to, first, take a slice of the signal by applying a moving window centered at time t to the signal, and then calculate the magnitude spectrum of the windowed signal. Consider a signal s(τ ) and a real, even window h(τ ), whose FTs are S(f ) and H(f ) respectively. To obtain a localized spectrum of s(τ ) at time τ = t, multiply the signal by the window h(τ ) centred at time τ = t, obtaining sh (t, τ ) = s(τ )h(τ − t), (4.10) and then take the FT w.r.t. τ , obtaining Sh (t, f ) = FT {s(τ )h(τ − t)} t→f (4.11) Sh (t, f ) is called the short-time Fourier transform (STFT). The squared magnitude of the STFT, denoted by ρspec (t, f ) is called the spectrogram (SPEC) [Boashash(1992c), Cohen(1995)]. It is mathematically expressed as ¯Z ∞ ¯2 ¯ ¯ 2 spec −j2πf τ ρ (t, f ) = |S(t, f )| = ¯¯ s(τ )h(τ − t)e dτ ¯¯ . (4.12) −∞ where S(t, f ) is the STFT. By varying t, one can obtain the spectral density as a function of t. The SPEC is a simple, popular and robust method in the analysis nonstationary signals. It is a proper energy distribution in the sense that it is positive. On the other hand, the SPEC has an inherent limitation : the frequency resolution is dependent on the length (and the type) of the analysis window ; too short windows cause a decrease in frequency resolution, and too long windows cause a decrease in time resolution, thus an inherent trade–off between time and frequency resolution in the SPEC for a particular window. It was argued that since a signal has a spectral structure at any given time, there should exist the notion of an “instantaneous spectrum” which has the physical attributes of an energy density. Based on this argument, the WVD was derived, and is defined for an analytic signal z(t) as [Boashash(1992c)] Z ∞ τ τ ∆ Wz (t, f ) = z(t + )z ∗ (t − ) e−j2πf τ dτ. (4.13) 2 2 −∞ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 4.4 Reduced Interference Distributions 67 It can be observed from (4.13) that the WVD is the Fourier transform (FT)1 of Kz (t, τ ) from τ to f , where τ τ ∆ Kz (t, τ ) = z(t + )z ∗ (t − ) 2 2 (4.14) is called the time–lag signal kernel. The WVD is the most widely studied TFD. It achieves maximum energy concentration in the TF plane about the IF for LFM signals [Cohen(1995)]. However, it is in general non–positive and it introduces cross–terms when multiple frequency laws (e.g. two LFM components) exist in the signals. A general class of quadratic TFD can be obtained by smoothing/filtering the WVD in t and f , and is expressed as [Cohen(1966)] ZZZ ∞ τ τ ∆ ρz (t, f ) = ej2πν(u−t) Γ (τ, ν) z(u + )z ∗ (u − ) e−j2πf τ dν du dτ (4.15) 2 2 −∞ where Γ (τ, ν) is a two–dimensional function in the Doppler–lag domain, (τ, ν), and is called the TFD Doppler–lag kernel. The kernel determines the TFD and its properties. We can obtain the TFD with certain desired properties by properly constraining the Γ (τ, ν) function. Table 4.1 lists some common TFD and their corresponding Doppler–lag kernels. Equation (4.15) can be simplified as [Boashash(2002)] : ρz (t, f ) = γ(t, f ) ? ? Wz (t, f ). t f (4.16) The notation ?? in (4.16) represents a convolution in both t and f directions, and tf γ(t, f ) is the time–frequency kernel obtained through a dFT operation on Γ (τ, ν) as : ZZ ∞ γ(t, f ) = Γ (τ, ν) e−j2πf τ e+j2πtν dτ dν −∞ Remark 4.1. Convention of dFT and dIFT operations : a dFT operation, transforming a function of two variables (t, f ) to another function of (τ, ν), contains one FT operation from t to ν and one IFT operation from f to τ , and the FT and IFT are interchangeable ; inversely, a dIFT operation, transforming a function of two variables (τ, ν) back to (t, f ), contains one IFT operation from ν to t and one FT operation from τ to f , and these IFT and FT operations are also interchangeable. 4.4 Reduced Interference Distributions The problem of cross–terms introduced by WVD when applying it to a multicomponent signal can be dealt with by selecting a suitable kernel Γ (τ, ν) which minimizes the cross–terms effectively. The corresponding TFD to such kernels are 1 Convention of FT and IFT operations : an FT operation will transform a function either from t to ν domain, or from τ to f domain ; inversely, an IFT operation goes either from ν back to t, or from f back to τ . M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 68 Time–Frequency Concepts known as reduced interference distributions (RID). Examples of time-frequency RID include the CWD [Choi et Williams(1989)], the BJD [Cohen(1966)], the cone– shaped ZAMD [Zhao et al.(1990)] and the MBD [Hussain(2002)], defined in Table 4.1. The RID may be applied in situations where there are simultaneously a number of signals of interest which need to be separated. Tab. 4.1: Some common TFD and their kernels. Name Kernel : g(ν, τ ) WVD 1 Z SPEC τ τ h(u + ) h∗ (u − ) e−j2πν u 2 2 2 2 CWD e−v BJD sin(πντ ) πντ ZAMD MBD 4.5 TFD : ρz (t, f ) τ ∗ τ z(t + ) z (t − ) e−j2πf τ dτ 2 2 Z 2 e−j2πf τ z(τ )h(τ − t) dτ ZZ r τ πσ −π2 σ(u−t)2 /τ 2 τ ∗ e z(t + ) z (t − ) e−j2πf τ du dτ τ2 2 2 Z ∞ Z t+ |τ | 2 τ 1 τ ∗ z(u + )z (u − ) du e−j2πf τ dτ |τ | 2 2 −∞ |τ | t− 2 Z ∞ Z t+ |τ | 2 τ ∗ τ z(u + )z (u − ) du e−j2πf τ dτ h(τ ) |τ | 2 2 −∞ 2α−1t−2 2 Z τ ∗ Γ(2α)/2 Γ (α) τ ? z(t + ) z (t − ) e−j2πf τ dτ t 2 2 cosh2α (t) Z τ /σ sin(πντ ) πντ |Γ(α + jπν)|2 ; α ∈ IR+ Γ2 (α) h(τ )|τ | The WVD and Ambiguity Function By taking the dFT of the WVD, we obtain the symmetrical AF, also called Sussman AF Az (τ, ν) = FT FT −1 {Wz (t, f )} t→ν f →τ ZZ ∞ = Wz (t, f ) ej2πf τ e−j2πνt dt df −∞ Z ∞ = Kz (t, τ ) e−j2πνt dt. (4.17) −∞ Slightly different definitions of AF have been used by different authors, however, they are all related to the symmetrical form Az (t, τ ) [Matz et Hlawatsch(1998b)]. A nonstationary signal, therefore, can be analyzed in either the time–frequency domain (t, f ) or the ambiguity domain (τ, ν), also called Doppler–lag domain. There, also, exists a relationship between the WVD and the AF via the Radon transform [Jain(1989)], that is, the FT of the Radon–transformed WVD yields the AF in polar coordinates [Ristic(1995)]. The concept of AF has been used as a very effective tool in the design of radar signals [Boashash(1992c), Cook et Bernfeld(1993)]. This function is the basis in modern radar technology. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 4.6 Relationships Among Dual Domains 4.6 69 Relationships Among Dual Domains The relationship between dual domain pairs, time–frequency and Doppler– delay, and, time–delay and Doppler–frequency, can be represented as in Figure 4.4 [Boashash(2002)] through FT and IFT with respect to variables. Each arrow in Figure 4.4 represents a FT from one variable to the other, the inverse direction represents an IFT operation. FT FT FT FT Fig. 4.4: Quadratic representations corresponding to the WVD. Wz (t, f ), Az (τ, ν), Kz (t, τ ) and Dz (ν, f ) are respectively the WVD, AF, time– lag signal kernel and the Doppler–frequency signal kernel of the analytic signal z(t). Moreover, for the general quadratic class of TFD in (4.15), the above relationship is illustrated in Figure 4.5 [Boashash(1992c), Boashash(2002)], the Az (τ, ν) is the GAF. Note that, there is a strong coherence between quadratic TF signal repre- FT FT FT FT Fig. 4.5: Dual domains of general signal quadratic representations. γ(t, f ), Γ (τ, ν), G(t, τ ) and G(ν, f ) are the TFD time–frequency, Doppler–lag, time–lag and Doppler–frequency kernel, respectively. ρz (t, f ) and Az (τ, ν) are the general quadratic TFD and the GAF of the analytic signal z(t). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 70 Time–Frequency Concepts sentations and LTV systems [Matz et Hlawatsch(1998b), Hlawatsch et Matz(2000)]. A quadratic time-frequency analysis of a LTV system can be based on the linear relation between the WVD [Mecklenbräuker et Hlawatsch(1997)] or modified WVD [Hlawatsch et Matz(2000)] of both the input and the output. This input-output relationship can be, in general, described as TFD [Gaarder(1968), Altes(1980), Flandrin(1988a), Nguyen et al.(2001b)] E {ρx (t, f )} = ρs (t, f ) ? ? Ψh (t, f ) t f (4.18) where ρs (t, f ) is a TFD of the input s(t) ; Ψh (t, f ) is the scattering function which is related to the random LTV channel impulse response h(t, ν) ; and E {ρx (t, f )} is the expected value of a TFD of the output x(t). 4.7 Time–Frequency Signal Synthesis Opposite to the TF signal analysis whereby the analysis algorithms are used to analyze the TV frequency behavior of signals, TF signal synthesis algorithms are used to synthesize, or estimate, signals from their TFD. Mathematically, assuming that z(t) is a signal of interest with ρz (t, f ) being its TFD in the general quadratic class, the synthesis problem can be formulated as : find the analytic signal ẑ(t) whose estimate TFD, ρẑ (t, f ), best approximates ρz (t, f ). Consequently, ẑ(t) gives the best estimate of z(t). Seminal to the problem of TF signal synthesis is the algorithm in [Boudreaux-Bartels et Marks(1986)] using WVD. The basis for the solution is the inversion property of the WVD [Boashash(1992c)] Z ∞ t 1 (4.19) Wz ( , f ) ej2πf t df z(t) = ∗ z (0) −∞ 2 implying that the signal may be reconstructed to within a complex exponential constant ejα = z ∗ (0)/|z(0)| given |z(0)| 6= 0. Other time-frequency synthesis algorithms can be found in [Boashash(1991), McHale et Boudreaux-Bartels(1993), Wood et Barry(1994), Hlawatsch et Krattenthaler(1997), Francos et Porat(1999)]. 4.8 IF Estimation There are two major existing approaches for IF estimation using TFD. The first is built on the first–order moment of TFD [Boashash(1991)]. The first–order moment of the WVD yields the IF [White et Boashash(1988), Boashash(1991)], while others yield approximations of the IF [Boashash(1992c)]. However it fails to estimate multicomponent signals due to the presence of cross–terms. The second approach is built on utilizing the fact that all TFD have peaks around the IF laws of signals. The peaks of the WVD was used for IF estimation and applied to many problems [Boashash(1992c)]. For better performance at lower SNR, the XWVD was proposed [Boashash et O’Shea(1993)]. Other algorithms of TFD–based peak estimation can be found in, for examples, [Boashash(1992c), M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 4.9 Engineering Applications of Time–Frequency Methods 71 Stankovic et Katkovnik(1998), Katkovnik et Stankovic(1998)]. Like the first approach, this approach also suffers from the presence of cross–terms in multicomponent signals which results in poor estimation. Upon the desired to design high resolution RID, the B-Distribution (BD) was then proposed in [Barkat(2000)], and MBD was developed in [Hussain(2002)], both with adaptive algorithms for IF estimation of multicomponent signals. 4.9 Engineering Applications of Time–Frequency Methods This section looks at the existing applications of time-frequency signal processing and describes a representative selection of time-frequency signal processing applications, encompassing telecommunications, radar, sonar, power generation, image quality, and biomedical engineering. – Time-Frequency Methods in Communications : Telecommunications is one of the key industries where time-frequency methods are already playing an important role. [Barbarossa et Scaglione(1999b)] investigate the problem of optimal precoding and channel capacity for transmission over linear timevarying (LTV) channel, in wireless communications where the multipath channels are underspread with finite Doppler and delay spread. In modern communication systems, a number of users can share the same communication channel via multiple-access (MA), common examples are FDMA, TDMA, and CDMA [Rappaport(1996)]. The potential demand for wireless communications combined with restricted availability of the radio frequency spectrum has motivated intense research into bandwidth-efficient multiple-access schemes. CDMA has received its potential attention amongst other schemes. Issues such as designing/assigning the spreading codes in CDMA and multiple access interference become major concerns in research activities of which a number of approaches are based on the concept of timefrequency [Crespo et al.(1995), Haas et Belfiore(1997), Joshi et Morris(1998)]. One of the main objectives of the third-generation mobile and personal telecommunication systems is to provide wide range of services with different bit rates [Swarts et al.(1999)]. Then, a new approach called time-frequencyslicing (TFS) was proposed for multirate access in [Karol et al.(1997)]. – Time-Frequency Methods in Radar : Time-frequency methodologies have made significant inroads already in these field. A baseband Doppler radar return from a helicopter target is an example of a persistent nonstationary signal. A linear time-frequency representation provides a high resolution suitable for preserving the full dynamic range of such complicated signals [Marple S.L.(2001)]. – Time-Frequency Methods in Biomedical Engineering : An example of time-frequency methodology used for the detection of seizures in recorded EEG signals is proposed in [Celka et al.(2001)]. The techniques used are adapted to the case of newborn EEGs, which exhibit some well defined features in the time-frequency domain that allow an efficient discrimination M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 72 Time–Frequency Concepts between abnormal EEGs and background. Another TF approach to newborn EEG seizure detection is described in [H. Hassanpour et Boashash(2003)]. – Other Applications : There are a number of applications that could not be included in the chapters for obvious space reasons. 4.10 Concluding Remarks Time-frequency signal analysis (TFSA) is a collection of theory and algorithms used for analysis and processing of non-stationary signals, as found in a wide range of applications. In this chapter, the main knowledge of TFSA have been summarized. Concisely presented, this comprehensive tutorial introduction to TFSA is accessible to anyone who has taken a first course in signal processing. However, expert reader can find a more detailed references and real life applications in the signal processing literature like as [Boashash(2002), Cohen(1995)]. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Deuxième partie Séparation de Sources Impulsives à Variance Infinie Dans le premier chapitre de cette partie (chapitre 5), nous rappelons les grands principes et méthodes existants de la séparation de sources. Ensuite, nous présentons dans les quatres chapitres suivants (chapitres 6–9) nos approches novatrices en séparation aveugle d’un mélange linéaire instantané de sources impulsives alpha-stables. Chapitre 5 State of the Art of BSS Blind source separation (BSS) or independent component analysis (ICA) is a method for finding underlying factors or components from multivariate statistical data. What distinguishes BSS from other methods is that it looks for components that are both statistically independent, and non-gaussian. In the second part of this thesis, we will focus on the BSS of linear instantaneous mixtures. This BSS problem has the advantages of simplicity and generality since the statistical principles used in this context can be applied also to solve the convolutive mixing problem. In this chapter, we briefly introduce the basic concepts, and estimation principles of BSS. Following different types of sources statistical information, our contributions in this part will be divided in four other chapters : The first one is devoted to BSS methods using fractional lower-order statistics (FLOS). We will pay a particular attention to this chapter1 and give a general framework, for separation methods using FLOS. In a second contribution of BSS, we give a theoretical procedure for constructing contrast functions using sub- or super -additive functional. The third contribution is devoted to BSS methods based on some normalized HOS. While the fourth one is devoted to a semi-parametric maximum likelihood approach coupling a stochastic version of the EM algorithm and the use of the log-spline functions to approximate the sources PDFs. 1 To our best knowledge there exists no BSS procedure based on FLOS whilst many are based on HOS and SOS. 76 5.1 5.1.1 State of the Art of BSS Introduction What is blind source separation (BSS) ? Blind source separation (BSS) is a fundamental problem in signal processing that is sometimes known under different names : blind array processing, independent component analysis, waveform preserving estimation, etc. In all these instances, the underlying model is that of m statistically independent signals s(t) = (S1 (t), · · · , Sm (t))T whose n mixtures are observed y(t) = (y1 (t), · · · , yn (t)) possibly in a noisy environment w(t) as shown in Fig. 5.1. BSS addresses the problem, from an observable mixture of source signals, to separate or ideally to reconstruct the unknown source signals. The term blind refers to the fact that the source signals and the way the source signals are mixed is unknown. The mixtures of the source signals are termed the observable signals, and the model of the mixing of the source signals is referred to as the mixing system A. The separated signals are obtained from the observable signals by means of a separation system B. The separated signals are obtained from the observable signals by means of a separation system ; in Figure 5.1 the signal model is depicted as a block diagram. BSS can have many applications in areas involving processing of multi-sensor si- w(t) s(t) A y(t) x(t) B? s(t) Fig. 5.1: Signal model for the blind source separation problem gnals. Examples at least include : Source localization and tracking by radar and sonar devices ; speaker separation (cocktail party problem) ; multiuser detection in communication systems ; medical signal processing, e.g., separation of EEG or ECG signals ; industrial problems such as fault detection ; extraction of meaningful features from data, etc. This area has been very active over the last two decades. Surprisingly, this seemingly impossible problem has elegant solutions that depend on the nature of the mixtures and the nature of sources statistical information [Hyvarinen et al.(2001)]. 5.1.2 Brief history of BSS The problem of blind source separation (BSS) has been first introduced by J. Hérault, C. Jutten, and B. Ans [Hérault et Ans(1984)], [Hérault et al.(1985)] for linear instantaneous mixtures. Then, many researchers have been attracted M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.1 Introduction 77 by the subject, and many other works appeared. More precisely, all through the 1980s, BSS was mostly known among French researchers, with limited influence internationally. The few BSS papers are some presentations in international neural network conferences in the mid-1980. In those time, another related and attractive field was higher-order spectral analysis, on which the first international workshop was organized in 1989. In this workshop, early papers on ICA by [Cardoso(1989a)] and P. Comon [Comon(1989)] were given. Cardoso used algebraic methods, especially higher-order cumulant tensors, which eventually led to the JADE algorithm [Cardoso et Souloumiac(1993)]. The use of fourth-order cumulants has been earlier proposed by [Lacoume et Ruiz(1988)]. The work of the scientists in the 1980’s was extended by, among others, A. Cichocki and R. Unbehauen, who were the first to propose one of the presently most popular ICA algorithms [Cichocki et al.(1994)], [Cichocki et Unbehauen(1996)]. Some other papers on ICA and BSS from early 1990s are published [Jutten(2000)]. However, until the mid-1990s, BSS remained a rather small and narrow research effort. Several algorithms were proposed that worked, usually in somewhat restricted problems, but it was not until later that the rigorous connections of these to statistical optimization criteria were exposed. BSS attained wider attention and growing interest after the publication of the infomax principle based approach [Bell et Sejnowski(1995)] on the in the mid-90’s. This algorithm was further refined by S.-I. Amari and his co-workers using the natural gradient [Amari et al.(1996)], and its fundamental connections to maximum likelihood estimation, as well as to the Cichocki-Unbehauen algorithm, were established. A couple of years later, A. Hyvarinen and E. Oja presented the fixed-point or FastICA algorithm [Hyvarinen(1999)], which has contributed to the application of ICA to large-scale problems due to its computational efficiency. Since the mid-1990s, there has been a growing wave of papers, workshops, and special sessions devoted to ICA. Indeed, many approaches for performing BSS have been proposed from different view-point. Using second-order statistics the popular SOBI algorithm was introduced in [Belouchrani et al.(1997a)] for spatial correlated sources. The same methodology was generalized to cyclostationary sources in [Abed-Meraim et al.(2001)]. A useful generalization of the fourth-order cumulants based methods was proposed in [Pesquet et Moreau(2001)], [Moreau(2001)]. The convolutive model has been addressed in [Castella et Pesquet(2004)]. Motivated by the useful incorporation of the priori information of data in the BSS framework, the Bayesian approach have been introduced in [Djafari(1999)]. Other researchers consider some priori information of the propagation system in a semi-blind model have also contribute in the BSS field. For example, in [Davy et al.(2002)] the Bayesian approach was coupled with MCMC techniques to estimate the chirp signals. Before that, in [Benidir(1997)] a polynomial approach was proposed. Nonlinear mixture BSS model was early investigated in [Abed-Meraim et al.(1996)] and [Krob et Benidir(1993)]. Since 1999 an international workshop on ICA and BSS gather, every year, more than 150 researchers working on blind signal separation, and contributes to the transformation of BSS to an established and mature field of research. As an extension to the instantaneous mixtures, other model of signals mixtures M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 78 State of the Art of BSS have been considered in the literature of signal processing. More precisely, we can distinguish three classes of mixtures : [C]- Linear instantaneous mixtures This model is commonplace in the field of narrow band array processing where the transfer function between sources and sensors is given by a constant matrix A (i.e., involves no delays or frequency distortion) called the ‘array matrix’ or the ‘mixing matrix’. Many array processing techniques rely on the modelling of A [Krim et Viberg(1996)] : each column of A is assumed to depend on a small number of parameters. This information may be provided either by physical modelling (for example, when the array geometry is known and when the sources are in the farfield of the array) or, more likely, by direct array calibration. In many circumstances however, this information is not available or is not reliable. Blind source separation techniques address the issue of identifying A and/or retrieving the source signals without resorting to any a priori information about mixing matrix A : it exploits only the information carried by the received signals themselves, hence the term blind. Performance of such a blind technique, by its very nature, is essentially unaffected by potential errors in the propagation model or in array calibration (this is obviously not the case of parametric array processing technique). Of course, the lack of information on the structure of A must be compensated by some additional assumptions on the source signals as will be shown next. [A]- Non-linear mixtures In the basic signal model of BSS, an unknown linear mixing process is often assumed. However, this model fails as soon as the linear approximation of the physical phenomenon is not valid. This is the case for example when the signal is received at an array of sensors with non-linear characteristics. Some particular non-linear models have been thoroughly studied in the literature such as the post non-linear model where the mixing process is a cascade of a linear mixture and a componentwise non-linear transform [Taleb(1999)], and the linear quadratic model where the observations are quadratic functions of the sources [Krob et Benidir(1993)], [Abed-Meraim et al.(1996)] and [Taleb(1999)]. The general non-linear problem is still largely unsolved except for some tentative solutions based on neural networks using self-organizing feature maps [Taleb(1999)] or information preserving nonlinear maps [Taleb(1999)], [Hyvarinen et al.(2001)]. [B]- Linear convolutive mixtures Many real-world communication systems involve source signals that are delayed and attenuated by different amounts on their way to the different sensors (receivers), as well as multipath propagation. Moreover, the multipath can be diffuse with long delay spread causing intersymbol interference and resulting in a situation termed ‘linear convolutive mixing’. Mathematically, the mixing is described by a matrix of linear filters operating on the sources. Although not completely solved, this problem is much better known than the non-linear mixing one. The M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.1 Introduction 79 first research works have focused on the case where the mixing is square (i.e., the number of inputs equals the number of outputs) where multitude solutions have been given using neural networks, independent component analysis (ICA), or information-theory approaches [Hyvarinen et al.(2001)]. Interestingly, by stacking successive observations into a single vector, the convolutive mixtures can be expressed as instantaneous mixtures (with a full column rank mixing matrix if more outputs than inputs). Thus, BSS solutions for instantaneous mixtures can be adapted to solve the convolutive mixture problem, e.g., [Mansour et al.(2000a)], [Babie-Zadeh(2002)], [Castella et al.(2004)], [Castella et Pesquet(2004)]. A good source with historical accounts and more complete list of references is [Jutten(2000)], a good overview of statistical principle in BSS is [Cardoso(1998)], and an elegant overview paper is [Mansour et al.(2000b)]. There is still much work that is left to do. For example, we still do not have an adequate explanation for why ICA converge for so many problems, almost always to the same solutions, even when the signals were not derived from independent sources !. Other serious problems include under-determined mixtures case, non-stationary sources, heavy-tailed sources, non-linear mixtures, dependent sources...etc still open for researchers efforts. 5.1.3 Statistical information for BSS Statistical moments of signals provide rich sources about the desired information. The whole spectrum of statistical moments runs from order 0 to order ∞ (see Figure 5.2). The oldest traditional signal separation methods utilize only secondorder moments like as PCA methods [Belouchrani et al.(1997a)]. All through the 1990s, BSS methods was extended to use widely the higher-order statistical moments [Nikias et Petropulu(1994)], [Comon(1994)], [Cardoso et Souloumiac(1993)]. More recently, lower-order fractional statistical signal processing techniques extract useful information from pthe-order statistics with −1 < p < 2. In this thesis, we will show that blind source separation methods based on stable models can be adequately solved using fractional lower-order moments, i.e., moments of order less than 2. Lower−order Fractional Moment Theory 0 0.5 0.8 1 Second−order Moment Theory 1.5 1.8 2 Higher−order Moment Theory 3 4 Fig. 5.2: Order of Statistics in Blind Source Separation M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 80 State of the Art of BSS Thus, the sources statistical information can be of three types : [A]- Higher order statistical information For non Gaussian independent sources higher order statistics (HOS) can be used to achieve the BSS. The first HOS-BSS approach traces back to the pioneering adaptive algorithm of [Hérault et Ans(1984)], [Hérault et al.(1985)]. This method do not use explicitly HOS but try to equalize the channel by minimizing a cost function that contains implicitly the information of higher order moments of the output. Alternative batch algorithms, that explicitly use higher-order cumulants, have been later developed ; see, for instance [Cardoso(1991)], [Comon(1994)]. Other HOS-based solutions include the separation by maximum likelihood (ML), separation by neural networks, separation by contrast function, separation by information-theoretic criteria, .. etc. [B]- Second order statistical information It was early used in blind equalization [Delmas et al.(2000)], in DOA estimation [Delmas(2004)] and in many other classical signal processing methods related on estimation and detection. When the data show some kind of temporal dependency, alternative BSS methods can be developed based on second order statistics [Abed-Meraim et al.(1997b)]. SOS-based methods are expected to be more robust to poor signal to noise ratios and short data sizes [Gazzah et Abed-Meraim(2003)]. BSS is feasible based on spatial correlation matrices [Belouchrani et al.(1997a)], [Mansour et al.(2000a)]. These matrices show a simple structure which allows straightforward blind identification procedures based on eigendecomposition. For example, a new algorithm using second-order cyclostationary statistics is introduced in [Abed-Meraim et al.(2001)]. [C]- Fractional lower order statistical information It is known that, for a non-Gaussian stable distribution with characteristic exponent α, only moments of order less than α are finite. In particular, the secondorder moment of a stable distribution with α < 2 does not exist, making the use of covariance as a measure of correlation meaningless. Similarly, many standard signal processing tools (e.g., spectral analysis and all higher-order techniques) that are based on the assumption of finite variance will be considerably weakened and may, in fact, give misleading results. Recall that the stable distribution is best used to model signals and noise that exhibit impulsive nature. This type of signal tends to produce outliers. Although the SOS and HOS -based BSS methods usually lead to analytically tractable results, they are no longer appropriate for an impulsive non-Gaussian signals. It has been demonstrated many times in the literature that second and higher-order estimates can deteriorate dramatically when only a small proportion of extreme observations is present in the data. The absence of a finite SOS and HOS does not mean, however, that there are no other adequate measures of independence of stable random variables. As it will be M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.2 Linear Instantaneous Mixtures 81 shown later in this thesis, the dispersion of a stable random variable plays a role analogous to the SOS. Despite the aforementioned difficulties, significant progress has been made in developing a linear estimation theory for stable processes over the past thirty years. In this thesis, we introduce a new source separation class of methods based on the use of the fractional lower-order statistics (FLOS), i.e., statistics of order less than 2. 5.2 Linear Instantaneous Mixtures Consider m mutually independent signals whose n ≥ m linear combinations are observed in noise : x(t) = y(t) + w(t) = As(t) + w(t) (5.1) where s(t) = [s1 (t), · · · , sm (t)]T is the real source vector, w(t) = [w1 (t), · · · , wn (t)]T is the real noise vector, and A is the n × m full rank mixing matrix. The purpose of blind source separation is to find a separating matrix i.e., an m × n matrix B such that z(t) = Bx(t) is an estimate of the source signals. 5.2.1 Separability and indeterminacies When the sources are white stationary processes, then their separation can be achieved under the following conditions. Theorem 5.1. If there is at most one Gaussian source, then the independence of the components of y implies BA = PΛ, where P and Λ represent a permutation and a diagonal matrix, respectively. In other words, the linear instantaneous mixtures are separable, up to a permutation and a scale indeterminacies, provided that there is at most one Gaussian source. We will not prove the identifiability of the BSS model here, since the proof is quite complicated ; see the Comon’s paper [Comon(1994)]. Next, we develop a kind of a constructive discussion (non-rigorous proof) about the identifiability. [A]- Separability of instantaneous linear mixtures model To make sure that the basic BSS model given in (9.6) can be estimated, we have to make certain assumptions : 1. The sources s(t) are at each time instant mutually independent : This is the principle on which ICA rests. Surprisingly, not much more than this assumption is needed to ascertain that the model can be estimated. This is why BSS is such a powerful technique with applications in many different areas. Basically, r.v.s Y1 and Y2 are said to be independent if information on the M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 82 State of the Art of BSS value of Y1 (of Y2 ) does not give any information on the value of Y2 (of Y1 ). Technically, independence can be defined by the PDFs. Let us denote p(y1 , y2 ) the joint PDF of Y1 and Y2 , and by pi (yi ) the marginal PDF of Yi for i = 1, 2. Then we say that Y1 and Y2 are independent if the joint PDF is decomposable in the following way : p(y1 , y2 ) = p1 (y1 )p2 (y2 ) (5.2) 2. At most one source has Gaussian distribution : Whitening also helps us understand why Gaussian variables are forbidden in BSS. Assume that the joint distribution of two sources s1 and s2 , is Gaussian. This means that their joint PDF is given by p(s1 , s2 ) = 1 s2 + s22 1 ksk2 exp(− 1 )= exp(− ) 2π 2 2π 2 (5.3) Now, assume that the mixing matrix A is orthogonal. For example, we could assume that this is so because the data has been whitened. Using the classic formula of transforming PDF’s, and noting that for an orthogonal matrix A−1 = AT holds, we get the joint density of the mixtures x1 and x2 as density is given by p(x1 , x2 ) = |det(AT )| kAT xk2 1 exp(− ) 2π 2 (5.4) Due to the orthogonality of A, we have kAT xk2 = kxk2 , det(A) = 1 and AT is also orthogonal. Thus we have p(x1 , x2 ) = ksk2 1 exp(− ) = p(s1 , s2 ) 2π 2 (5.5) and we see that the orthogonal mixing matrix does not change the PDF, since it does not appear in this PDF at all. The original and mixed distributions are identical. Therefore, there is no way how we could infer the mixing matrix from the mixtures. The phenomenon that the orthogonal mixing matrix cannot be estimated for Gaussian variables is related to the property that uncorrelated jointly Gaussian variables are necessarily independent. Thus, the information on the independence of the components does not get us any further than whitening. Thus, in the case of Gaussian independent components, we can only estimate the BSS model up to an orthogonal transformation. In other words, the matrix A is not identifiable for Gaussian independent components. With Gaussian variables, all we can do is whiten the data. What happens if we try to estimate the BSS model and some of the components are Gaussian, some non-Gaussian ? In this case, we can estimate all the non-Gaussian components, but the Gaussian components cannot be separated from each other. In other words, some of the estimated components will be arbitrary linear combinations of the Gaussian components. Actually, this means that in the case of just one Gaussian source, we can estimate the model, because the signal Gaussian component does not have any other Gaussian component that it could be mixed with. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.2 Linear Instantaneous Mixtures 83 3. The number of sensors is greater than or equal to the number of sources n ≥ m : This assumption is needed to make the mixing matrix A a full rank matrix. Then after estimating the matrix A, we can compute its inverse, say B, and obtain the independent components simply in the noiseless case by s = Bx. [B]- Indeterminacies in the instantaneous linear mixtures model In the BSS model (9.6), it is easy to see that the following two ambiguities or indeterminacies will necessary hold. First there is no way of knowing the original labelling of the sources, hence any permutation of the outputs is also a satisfactory solution, i.e., if z(t) is a solution then Pz(t) is also a solution for any permutation matrix P. Choosing a labelling of the outputs can only be done with some extra knowledge of the system. The second ambiguity is that exchanging a fixed scalar factor between a source signal and the corresponding column of A does not affect the observations as is shown by the following relation : x(t) = As(t) + w(t) = m X ap p=1 λp λp sp (t) + w(t) (5.6) where λp is an arbitrary real factor and ap denotes the p-th column of A. It follows that the best that one can do is to determine B (or equivalently the matrix A) up to a permutation and scaling of its columns [Mansour et al.(2000b)]. Therefore, B is said to be a separating matrix if By(t) = PΛs(t) where P is a permutation matrix and Λ a non-singular diagonal matrix. Similarly, blind identification of A is understood as the determination of a matrix equal to A up to a permutation matrix and a non-singular diagonal matrix. Many authors take advantage of the scaling indetermination by assuming, without any loss of generality, that the source signals have unit variance, so that the dynamic range of the sources is accounted for by the magnitude of the corresponding columns of A. Other normalization strategies exist such as normalizing the diagonal entries of A (respectively B) to unity. 5.2.2 How to find the independent components It may be very surprising that the independent components can be estimated from linear mixtures with no more assumptions than their independence. In this chapter, we will try to explain briefly why and how this is possible. [A]- Uncorrelatedness is not enough The first thing to note is that independence is a much stronger property than uncorrelatedness. Considering the BSS problem, we could actually find many different uncorrelated representations of the signals that would not be independent and would not separate the sources. Uncorrelatedness in itself is not enough to separate the components. This is also the reason why principal component analysis M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 84 State of the Art of BSS (PCA) or factor analysis cannot separate the signals : they give components that are uncorrelated, but little more. In fact, by using the well-known decorrelation methods, we can transform any linear mixture of the independent components into uncorrelated components, in which case the mixing is orthogonal. Thus, the trick in BSS is to estimate the orthogonal transformation that is left after decorrelation. This is something that classic methods cannot estimate because they are based on essentially the same covariance information as decorrelation. In the following, we consider a couple more sophisticated and popular procedures for estimating ICA. [B]- Nonlinear decorrelation is the basic ICA method One way of stating how independence is stronger than uncorrelatedness is to say that independence implies nonlinear uncorrelatedness : If s1 and s2 are independent, then any nonlinear transformation g(s1 ) and h(s2 ) are uncorrelated.2 In contrast, for two r.v. that are merely uncorrelated, such nonlinear transformations do not have zero covariance in general. Thus, we could attempt to perform BSS by a stronger form of decorrelation, by finding a representation where the yi are uncorrelated even after some nonlinear transformations. This gives a simple principle of estimating the separating matrix B : BSS approach 1 : Nonlinear decorrelation. Find the matrix B so that for any i 6= j, the components yi and yj are uncorrelated, and the transformed components g(yi ) and h(yj ) are uncorrelated, where g and h are some suitable nonlinear functions. This is a valid approach to estimating ICA : If the nonlinearities are properly chosen, the method does find the independent components. Although this principle is very intuitive, it leaves open an important question : How should the nonlinearities g and h be chosen ? Answer to this question can be found be using principles from estimation theory and information theory. Estimation theory provides the most classic method of estimating any statistical model : the maximum likelihood method. Information theory provides exact measures of independence, such as mutual information. Using either one of these theories, we can determine the nonlinear functions g and h in a satisfactory way. [C]- Independent components are the maximally non-gaussian components Another very intuitive and important principle of ICA estimation is maximum non-gaussianity. The idea is that according to the central limit theorem, sums of non-gaussian r.v.s are closer toPgaussian that the original ones. Therefore, if we take a linear combination y = i bi xi of the observed mixture variables, this will be maximally non-gaussian if it equals one of the independent components. This is because if it were a real mixture of two or more components, it would be closer to a gaussian distribution, due to the central limit theorem. Thus, the principle can be stated as follows : 2 In the sense that their correlation is zero i.e. IE[g(s1 )h(s2 )] = 0. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.3 Basic BSS Methods 85 BSS approach 2 : Maximum non-Gaussianity. P Find the local maxima of non-gaussianity of a linear combination y = i bi xi under the constraint that the variance of y is constant. Each local maximum gives independent component. To measure non-gaussianity in practice, we could use, for example, the kurtosis. Recall that the kurtosis is a normalized higher-order cumulant, which are some kind of generalizations of variance using higher-order polynomials. Cumulants have interesting algebraic and statistical properties which is why they have an important part in the theory of BSS. An interesting point is that this principle of maximum non-gaussianity shows the very close connection between BSS and an independently developed technique of robust statistic called projection pursuit. in projection pursuit, we are actually looking for maximally non-gaussian linear combinaisons, which are used for visualization and other purposes. Thus, the BSS problem can be interpreted as projection pursuit directions. [D]- Importance role of numerical techniques : In addition to the estimation principle, one has to find efficient algorithm for implementing the computations needed. Thus, numerical algorithms are an integral part of BSS methods. The numerical methods are typically based on optimization of some objective functions. The basic optimization method is the gradient method. For example, a well known fixed-point algorithm called FastICA has been tailored to exploit the particular structure of the ICA problem. 5.3 5.3.1 Basic BSS Methods BSS by minimization of mutual information An important approach for blind source separation, inspired by information theory, is the minimization of mutual information. The motivation of this approach is that it may not be very realistic in many cases to assume that the data follows the BSS model. Therefore, we would like to present here an approach that does not assume anything about the date. What we want to have is a general measure of the dependence of the components of a random vector. Using such a measure, we could define BSS as a linear decomposition that minimize that dependence measure. We recall here very briefly the basic definitions of information theory. The differential entropy H of a random vector y = (y1 , · · · , yn )T with density p(y) is defined as : def H(y) = −IE{log p(y)} (5.7) A normalized version of entropy is given by negentropy J, which is defined as follows def J = H(yGauss ) − H(y) (5.8) where yGauss is a Gaussian random vector of the same covariance or correlation matrix as y. Negentropy is always non-negative, and is equal to zero only for M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 86 State of the Art of BSS Gaussian random vectors. Mutual information I between n r.v.s yi , i = 1, · · · , n is defined as follows n X I(y1 , · · · , yn ) = H(yi ) − H(y) (5.9) i=1 Mutual information can also be expressed as the Kullback-Leibler divergence of Q py (y) and i pyi (yi ) : ¸ py (y) I(y) = py (y) log Q dy y i pyi (yi ) def Z · (5.10) [A]- Mutual information as a measure of dependence From the well known properties of the Kullback-Leibler divergence, I(y) is alQ ways non-negative, and is zero if and only if py (y) = i pyi (yi ), that is, if y1 , · · · , yn are independent [Cover et Thomas(1991)]. Consequently, I(y) is a measure of dependence or contrast function and source separation algorithms can be designed based on its minimization. [B]- Mutual information as maximum likelihood estimation Mutual information (MI) and likelihood are intimately connected. Indeed, it was shown that the minimization of the mutual information is asymptotically a Maximum Likelihood (ML) estimation of the sources [Taleb(1999)]. More and more connections exist between MI and ML approaches in practice because we do not know the distributions of the sources. For example, the need of approximation of MI can use the ML estimation. Consequently, many recent works are based on this criterion [Pham(1999)]. [C]- Mutual information as maximization of non-gaussianity To state the idea, suppose that the whitening z = Wx has been done, and hence a unitary matrix U must be estimated to achieve independent outputs. Now 1 from y = Uz we have py (y) = |det(U)| pz (z). Consequently, H(y) = H(z) and P I(y) = H(y ) − H(z). Since H(z) does not depend on U, minimizing I(y) i i with respect to U is equivalent to minimizing the sum of the marginal etropies. Moreover, −H(yi ) can be seen as the Kullback-Leibler divergence between the density of yi and a zero-mean unit-variance Gaussian density (up to a constant term). This leads us to this conclusion that U must be estimated to produce the outputs as non-Gaussian as possible. This fact has a nice intuitive interpretation : from the central limit theorem we know that the mixing tends to gaussianize the observations, and hence the separating system should go to the opposite direction. A well-known algorithm based on the non-gaussianity of the outputs is FastICA [Hyvarinen(1999)], [FastICA(1998)] which uses negentropy as a measure of nongaussianity. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.3 Basic BSS Methods 87 [D]- Algorithms for minimization of mutual information To use MI in practice, we need some method of estimating or approximating it from real data. We recall that there is many mutual entropy approximation techniques. The cumulant-based approximation was proposed in [Jones et Sibson(1987)], and it is almost identical to those proposed in [Comon(1994)]. The approximation of entropy using nonpolynomial functions were introduced in [Hyvarinen(1998)], and they are closely related to the measure of non-gaussianity that have been proposed in the projection pursuit literature, see, e.g., [Cook et al.(1993)]. 5.3.2 BSS by maximization of non-gaussianity Non-gaussianity is actually of paramount importance in blind source separation. Without non-gaussianity the separation is not possible at all, as shown above. An important class of source separation algorithms is based on the non-gaussianity of the outputs [Hyvarinen(1999)], [FastICA(1998)]. As a first practical measure of non-gaussianity, the fourth-order cumulant, or kurtosis was introduced. Practical algorithms were derived using the gradient and fixed-point methods. However, kurtosis has some drawbacks in practice, when its value has to be estimated from a measured sample. The main problem is that kurtosis can be very sensitive to outliers. in other words, kurtosis is not a robust measure of non-gaussianity. To mitigate this problem, the negentropy was proposed as an important measure of non-gaussianity. Its properties are in many ways opposite to those of kurtosis : It is robust but computationally complicated. Furthermore, computationally simple approximation of negentropy that more or less combine the good properties of both measures are introduced in different paper in BSS literature (for more detail refer to [Hyvarinen et al.(2001), chapter 8]). 5.3.3 BSS by maximum likelihood estimation A very popular approach for estimating the independent component analysis model is maximum likelihood (ML) estimation. Maximum likelihood estimation is a fundamental method of statistical estimation ; a short introduction will be provided in chapter 8. One interpretation of ML estimation is that we take those parameter values as estimates that give the highest probability for the observations. To perform maximum likelihood estimation in practice, we need an algorithm to perform the numerical maximization of likelihood. For that, we distinguish two cases : • Sources PDF’s are known : If the densities of the independent components are known in advance, a very simple gradient algorithm can be derived. To speed up convergence, the natural gradient version and especially the FastICA fixed-point algorithm can be used that maximize the likelihood faster and more reliably. • Sources PDF’s are unknown : If the densities of the independent components are not known, the situation is somewhat more complicated. Fortunately, however, it is enough to use a very rough density approximation as we will perform in chapter 8 using the family of log-spline functions. The choice of M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 88 State of the Art of BSS the density can then be based on the information whether the independent components are sub- or super-gaussian. Such an estimate can be simply added to the gradient methods, and it is automatically done in FastICA. This is also the approach we have used throughout this thesis in the noisy case (see chapter 8) as a semi-parametric maximum likelihood approach. 5.3.4 BSS by algebraic tensorial methods One approach for estimation of independent component analysis consists of using higher-order cumulant tensor. Tensors can be considered as generalization of matrices, or linear operators. Cumulant tensors are then generalization of the covariance matrix. The covariance matrix is the second-order cumulant tensor, and the fourth order tensor is defined by the fourth-order cumulants Cum(xi , xj , xk , xl ). We can use the eigenvalue decomposition of the covariance matrix to whiten the data [Abed-Meraim et Hua(1997)]. This means that we transform the data so that second-order correlations are zero. As a generalization of this principle, we can use the fourth-order cross-cumulant tensor to make the fourth-order cumulants zero, or at least as small as possible. This kind of higher-order decorrelation gives one of the most popular methods for blind source separation [Cardoso et Comon(1996)]. Joint approximate diagonalization of eigenvalue decomposition is one method in this category that has been successfully used in low-dimensional problems [Cardoso et Souloumiac(1993)]. In the special case of distinct kurtoses, a computationally very simple method (FOBI) can be devised. An accessible and fundamental paper is [Cardoso(1999)] that also introduces sophisticated modifications of the previously proposed tensorial methods. A more interesting generalization is given in [Moreau(2001)]. The tensor-based methods, however, have become less popular recently [Belouchrani et al.(2001)]. This is because methods that use the whole EVD like JADE are restricted, for computational reasons, to small dimensions. Moreover , they have statistical properties inferior to those methods using non-polynomial cumulant or maximum likelihood. We shall consider this approach in more details in chapter 7. Indeed, we propose in this thesis a normalized version of this class of methods using some normalized second-order and fourth-order cumulants tensors to separate heavy-tailed signals [Sahmoudi et al.(2004a)]. 5.3.5 BSS by non-linear decorrelation This approach is the early research effort in BSS that was successfully used by Jutten, Hérault, and Ans to solve the first ICA problems. A good review of this class of techniques can be found in [Jutten(2000)]. Today, this work is mainly of historical interest, because there exist several more efficient algorithms for BSS. Nonlinear decorrelation can be seen as an extension of second-order methods. Independent sources can in some cases be found as nonlinearly uncorrelated linear combinations. The nonlinear functions used in this approach introduce higher order statistics into the solution method, making blind source separation M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.3 Basic BSS Methods 89 possible. In [Cichocki et Unbehauen(1996)], the first most popular learning algorithm was introduced as an extension of the first source separation algorithm [Hérault et Ans(1984)]. Another well known algorithm is the equivariant adaptive separation via independence ; EASI algorithm based on nonlinear decorrelation [Cardoso et Lahed(1996)]. In [Amari et Cardoso(1997)], a different framework based on the estimating functions was introduced. Other somewhat related methods were proposed in the blind source separation literature [Cichocki et Amari(2002)]. We will recall this approach and give more detail in chapter 7 as well as a normalized version, based on some normalized statistics, of this category of methods to separate heavy-tailed sources [Sahmoudi et Abed-Meraim(2004b)]. 5.3.6 BSS using geometrical concepts Another method for BSS is the geometric approach [Mansour et al.(2001), Mansour et al.(2002a), Babaie-Zadeh et al.(2004)]. This approach which holds essentially for two sources and two sensors, is based on a geometrical interpretation of the independence of two random variables. To state the idea more clearly, suppose that the marginal PDF’s of the sources s1 and s2 are non-zero only within the intervals M1 ≤ s1 ≤ M2 and N1 ≤ s2 ≤ N2 . Then, from the independence of s1 and s2 , we have ps1 s2 (s1 , s2 ) = ps1 (s1 )ps2 (s2 ), and hence the support of ps1 s2 (s1 , s2 ) will be the rectangular region {(s1 , s2 )|M1 ≤ s1 ≤ M2 , N1 ≤ s2 ≤ N2 }. In other words, the scatter plot of the source samples forms a rectangular region in the (s1 , s2 ) plane. The linear mapping x = As transforms this region into a parallelogram region. Without loss of generality, one can write · A= 1 a b 1 ¸ then it can be seen that the slopes of the borders of the scatter plot of the observations will be b and 1/a. Hence estimating the mixing matrix A is equivalent to estimating the slopes of the borders of this parallelogram. 5.3.7 Source separation using Bayesian framework Throughout our work so far we have assumed that there is no information available about the true parameter beyond that provided by the data. However, there are situations in which most statisticians would agree that more can be said. Technically, there is a substantial number of statisticians in the Bayesian School who feel that it is always reasonable, and indeed necessary, to think of the true value of the parameter θ as being the realization of a random variable θ with a known distribution. This distribution does not always correspond to an experiment that is physically realizable but rather is thought of as a measure of the beliefs of the experimenter concerning the true value of θ before he or she takes any data. To describe the Bayesian procedure to source separation, let us write the Bayes’ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 90 State of the Art of BSS theorem in the case of a source separation problem [Knuth(1999)] P (A, s(t)|x(t), I) = P (x(t)|A, s(t), I) P (A, s(t)|I) P (x(t)|I) (5.11) where I represents any prior information. We can rewrite the equation as a proportionality and equate the inverse of the prior probability of the data P (x(t)|I) to the implicit proportionality constant P (A, s(t)|x(t), I) ∝ P (x(t)|A, s(t), I) P (A, s(t)|I) (5.12) The probability on the left-hand side of Equation (5.12) is referred to as the posterior probability . It represents the probability that given model accurately describes the physical situation. The first term of the right-hand side is the likelihood of the data given model. It describes the degree of accuracy with which we believe the model can predict the data. The final term on the right is the prior probability of the model, also called the prior. This prior represent the degree to which we believe the model to be correct based only on our prior information about the problem. It is through the assignment of the likelihood and priors that we express all of our knowledge about the particular source separation problem. If the linear mixture is relatively noise-free, the aims become now to estimate a separating matrix B that optimizes the posterior probability of the model and estimate the source signals by applying the separation matrix to the recorded data. The Bayesian methodology has several advantages. The most important advantage is the fact that all of the prior knowledge about a specific problem is expressed in terms of prior probabilities that must be evaluated. This provides one with the means to incorporate any additional relevant information into a problem [Djafari(1999)], [Snoussi et M.-Djafari(2000)]. Finally, I want to refer the French reader to Snoussi’s thesis [Snoussi(2003)] as one of the best references, to my knowledge, about this class of methods. 5.3.8 BSS using time structure In many applications, the source signals represent temporally correlated (colored) random processes referred to as colored time signals or time series. In that case, they may contain much more structure than white random processes. This additional information can actually make the estimation of the BSS model possible in cases where the basic BSS methods cannot estimate it. For that, we should make some assumptions on the time structure of the sources that allows for their separation. These assumptions are alternatives to the assumption of non-gaussianity. [A]- Separation by autocovariances The simplest form of time structure is given by autocovariances. This means covariances between the values of the signal at different time instants : Cov[xi (t)xi (t− τ )] where τ is some lag constant. If the data have time-dependencies, the autocovariances are often different from zero. In addition, to autocovariances of one signal, we also need covariances between two signals : Cov[xi (t)xj (t − τ )] where i 6= j. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.3 Basic BSS Methods 91 All these statistics for a given time lag can be grouped together in the time-lagged covariance matrix def Cxτ = E{x(t)x(t − τ )T } (5.13) The key point here is that the information in a time-lagged covariance (called also cross-correlation) matrix Cxτ could be used instead of the higher-order information [Molgedey et Schuster(1994)]. What we do is to find a matrix B so that in addition to making the instantaneous cross-correlation of y(t) = Bx(t) go to zero, the lagged covariances are made zero as well : E{yi (t)yj (t − τ )} = 0, for all i, j, τ (5.14) The motivation for this is that for the sources si (t), the lagged covariances are all zero due to independence. Using these lagged covariances, we get enough extra information to estimate the sources, under certain conditions like as the the assumption of ”any two sources have different spectral shapes [Belouchrani et al.(1997a)]”. No higher-order information is then needed. Using this approach, we have a simple algorithm, called AMUSE [Tong et al.(1991)], for estimating the separating matrix B from whitened data : 1. Whiten the data x to obtain z. def 2. Compute the eigenvalue decomposition of C̄zτ = 21 [Cτ + CTτ ], where Cτ = E{z(t)z(t − τ )} is the time-lagged covariance matrix, for some lag τ . 3. The rows of the separating matrix B are given by the eigenvectors. An essentially similar algorithm was proposed in [Tong et al.(1991)]. An extension of the AMUSE method that improves its performance is to consider several time lags τ instead of a single one. Then, it is enough that the covariances for one of these time lags are different. Thus, the choice of τ is a somewhat less serious problem. The principle consist to simultaneously diagonalize all the corresponding lagged covariance matrices. The algorithm called SOBI (second-order blind identification) [Belouchrani et al.(1997a)] is based on these principles, and so is TDSEP [Ziehe et Müller(1998)]. [B]- Separation by non-stationarity of variances If it is assumed that the sources are non-stationary, then we can divide the signals into short windows as a sparse decomposition of the signal and consider the covariances in each one Et∈Tk {yi (t)yj (t)} (5.15) where Tk = (kT, (k + 1)T ]. Then, using joint diagonalization of the covariance matrices at different segments we can separate the non-stationary sources [Pham et Cardoso(2001)]. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 92 5.4 5.4.1 State of the Art of BSS BSS of Impulsive Heavy-Tailed Sources Why heavy-tailed α-stable distributions ? The emphasis in this thesis is on a class of signals that have heavy tailed. Heavy tail signals are likely to exhibit large observation and have often an impulsive nature - and it turns out that a broad class of real life signals have heavy tailed nature. The term heavy tail refers to that the probability density functions of the signals have relative large mass in the tails. This section provides a motivation for this part of this thesis and a brief discussion of the fundamental problems in blind signal separation. This part of my work is partly motivated by the lack of strong theoretical basis for blind signal separation of heavy tailed signals in the signal processing literature. Furthermore, the motivation for considering heavy tailed distribution in this thesis is that many real world signals turns out to have heavy tail laws [Adler et al.(1998)]. The main difference between the non-Gaussian stable distribution and the Gaussian distribution is that the tails of the stable density are heavier than those of the Gaussian density. In addition, the stable distribution is very flexible as a modeling tool in that it has a parameter α (0 < α ≤ 2), called the characteristic exponent, that controls the heaviness of its tails. A small positive value of α indicates severe impulsiveness, while a value of α close to 2 indicates a more Gaussian type of behavior. Stable distribution obey the Generalized Central Limit Theorem (GCLT) which states that if the sum of i.i.d random variables with or without finite variance converges to a distribution by increasing the number of variables, the limit distribution must be stable [Samorodnitsky et Taqqu(1994)]. Thus, non-Gaussian stable distributions arise as sums of random variables in the same way as the Gaussian distribution. Another defining feature of the stable distribution is the so-called stability property, which says that the sum of two independent stable random variables with the same characteristic exponent is again stable and has the same characteristic exponent. For these reasons, statisticians [Samorodnitsky et Taqqu(1994)], economists [Rachev(2003)], signal processing and communications engineers [Nikias et Shao(1995)], and other scientists engaged in a variety of disciplines have embraced alpha-stable processes as the model of choice for heavy-tailed data. 5.4.2 Existing BSS methods for heavy-tailed signals A common characteristic property of many heavy-tailed distributions, such as the α−stable family, is the nonexistence of finite second or higher order moments. There are several well-known methods for source separation [Hyvarinen et al.(2001)], based in general on second or higher-order statistics of the observations and so are inadequate to handle heavy-tailed sources. In that case, fractional lower-order theory can be used for stable signal separation. Only a limited literature was dedicated to BSS of impulsive signals. In [Shereshevski et al.(2001)], authors proposed the RQML algorithm based on the idea of setting the signals to zero wheM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 5.5 Conclusion & Future Research 93 never they are larger (in absolute value) than some threshold K. Recall that RQML is the restricted quasi-maximum likelihood approach introduced as an extension of the popular Pham’s quasi-maximum likelihood approach to the α-stable sources case. Other solutions exist in the literature based on the spectral measure [Kidmose(2001)], the order statistics [Shereshevski et al.(2001)] and the characteristic function [Eriksson et Koivanen(2003)]. Recently, in [Chen et Bickel(2004)] a new method based on a consistent prewhitening step was proposed. The authors, use the characteristic function based contrast function proposed in [Kagan et al.(1973)] to achieve the source separation problem and show that this approach can be consistent even when some hidden sources do not have finite second moments. However, this approach has consistent performance justly in the following two cases ; first, at most one component of sources has infinite second moment and second if we have justly two alpha-stables sources. In this thesis, we introduce some new methods for α-stable source separation from their observed linear mixtures using the minimum dispersion criterion [Sahmoudi et al.(2003a)], contrast functions [Sahmoudi(2005)], normalized statistics [Sahmoudi et al.(2004a), Sahmoudi et Abed-Meraim(2004b)] and the maximum likelihood [M. Sahmoudi et al.(2005)]. 5.5 Conclusion & Future Research In this chapter the fundamental methods of BSS are presented. Although several limitations and assumptions impede the use of BSS methods, it seems appropriate to conjecture that the algorithms and methods are useful tools with many potential applications where many second-order statistical methods reach their limits. Several researchers believe that these techniques will have a huge impact on engineering methods and industrial applications. It is interesting to note that there are many issues subject to further investigation. – Underdetermined BSS : Having more sources than sensors is of theoretical and practical interest. – Noisy BSS : Much more work needs to be done to determine the effect of noise on performance. Sparse representation and independent factorial analysis are very promising ideas. – Non-stationarity problem : time-frequency analysis and unsupervised classification are two promising approaches in this context. – BSS for data mining and data warehouse : Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. The goal is to find a subset of a collection of documents relevant to a user’s information request. We believe that M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 94 State of the Art of BSS the BSS model enables to extend the formulation into more unsupervised classification problems. – Blind separation of heavy-tailed signals : some standard BSS methods cannot work in this case, other are mathematically not justifiable for the fact that heavy-tailed distribution not have neither finite SOS nor HOS . The goal of the half of this thesis is to investigate this problem. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 6 Minimum Dispersion Approach 6.1 Introduction This chapter introduces a new Blind Source Separation (BSS) approach for extracting impulsive source signals from their observed mixtures. The impulsive, or heavy-tailed signals are modeled as real-valued symmetric alpha-stable (SαS) processes characterized by infinite second and higher order moments. A new whitening procedure by a normalized covariance matrix is introduced. The proposed approach uses the minimum dispersion (MD) criterion as a measure of sparseness and independence of the data. We show that the proposed method is robust, so-named for the property of being insensitive to possible variations in the underlying form of sampling distribution. Algorithm derivation, discussion and simulation results are provided to illustrate the good performance of the proposed approach. In particulary, the new method has been compared with three of the most popular BSS algorithms ; JADE [Cardoso et Souloumiac(1993)], EASI [Cardoso et Lahed(1996)] and RQML [Pham et Garrat(1997)]. 6.1.1 The failure of second and higher -order methods From signal processing point of view, the adoption of a stable model for signal or noise has important consequences. Second-order stationary processes have been historically the main subject of study in statistical signal processing. Second-order-based estimation techniques are commonly recognized as the natural tools to be used in the presence of gaussian noise. Research efforts on higherorder statistics (HOS) have led to the development of improved estimation algorithms for non-Gaussian environments, but this work has been based on the assumptions that second-order and higher-order statistics of the processes exist 96 Minimum Dispersion Approach and are finite [Nikias et Petropulu(1994)]. Important non-Gaussian impulsive processes can be efficiently modeled by heavy-tailed processes with infinite variance for which neither the classical second-order theory nor the theory of HOS are useful [Nikias et Shao(1995)]. It has been shown repeatedly in the literature that infinite-variance processes that appear in practice are well modeled by probability distributions with algebraic tails, i.e., random variables for which IP (|X| > x) ∼ cx−α (6.1) for some fixed c, α > 0. Algebraic-tailed r.v.s exhibit finite absolute moments for orders less than α ; i.e., IE|X|p < ∞, p<α (6.2) Conversely, if p ≥ α, the absolute moments become infinite, and thus unsuitable for statistical analysis. When α < 2, the processes present infinite variance, and the standard second or higher-order statistics cannot be successfully applied. 6.1.2 Fractional lower-order statistics (FLOS) theory Alternative attempts to characterize the behavior of impulsive signals have relied on fractional lower-order statistics (FLOS) in the context of non-Gaussian α-stable distribution (α < 2). It has been shown that FLOS give robust measures of impulsive processes’ characteristics [Ma et Nikias(1995a)], [Nikias et Shao(1995)]. For a zero location alpha-stable r.v. X with dispersion γ, the norm of X is defined as ½ γ if 0 < α < 1 kXkα = (6.3) 1 α if 1 ≤ α ≤ 2 γ Hence, the norm kXkα is a scaled version of the dispersion γ. If X and Y are jointly alpha-stable, the distance between X and Y is defined as dα (X, Y ) = kX − Y kα (6.4) p Combining this equations with the fact that IE|X|p ∝ γ α for 0 < p < α, it is easy to see that the p-th order moment of the difference between two alpha-stable r.v.s is a measure of the distance dα between these two r.v.s. In addition, all fractional lower-order moments of an alpha-stable r.v. are equivalent, i.e. p-th and q-th order moments differ by a constant factor independent of the r.v. as long as p, q < α. Furthermore, it was shown in [Schilder(1970)] that for 1 ≤ α ≤ 2, k.kα is a norm in the linear space of alpha-stable processes. Our proposed blind source separation methodology presented in this chapter uses the notion of fractional lower-order moments to achieve robust signal reconstruction. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.2 Source Separation Procedure 6.2 97 Source Separation Procedure 6.2.1 Whitening by normalized covariance matrix The first step consists of whitening the observations (orthogonalizing the mixture matrix A). For finite variance signals, the whitening matrix W is computed as the inverse square root of the signal covariance matrix. At a first glance, this should not be applied to α-stable sources. However, we prove in the following that a properly normalized covariance matrix converges to a finite matrix with the appropriate structure when the sample size N tends to infinity. More specifically, we propose and prove the following results. Theorem 6.1. Let X1 and X2 be two SαS variables of dispersions γ1 and γ2 and ˆ 1 |2 γ1 PDFs f1 (.) and f2 (.), respectively. Then, we have limN →∞ IE|X where ˆ 2 |2 = γ2 IE|X P N 1 ˆ denotes the time averaging operator IE[g(X)] ˆ IE = g[X(t)]. N t=1 Proof Let T be an arbitrary positive constant and 1I{|X|≤T } the indicator function which equals 1 if |X| ≤ T and 0 otherwise. Then, due to the ergodicity of X1 and X2 , we have 1 PN 2 2 ˆ 2 1I{|X |≤T } ] IE[X 1 t=1 x1 (t)1I{|X1 |≤T } N →∞ IE[X1 1I{|X1 |≤T } ] 1 N −→ = P N 1 2 ˆ 2 1I{|X |≤T } ] IE[X22 1I{|X2 |≤T } ] IE[X 2 2 t=1 x2 (t)1I{|X2 |≤T } N (6.5) Due to the symmetry of α-stable PDF the right hand side term can be expressed as RT RT 2 2 IE[X12 1I{|X1 |≤T } ] x f1 (x)dx −T |x| f1 (x)dx = RT = R0T (6.6) 2 IE[X2 1I{|X2 |≤T } ] |u|2 f2 (u)du u2 f2 (u)du −T 0 Using integration by parts and the fact that (1 − Φ(x)) ∼ C2α γx−α as x → ∞ for any SαS distribution function Φ, we obtain that as T → ∞, the above ratio is equivalent to RT Cα γ1 [x2−α ]T0 − 2 0 x1−α dx T →∞ γ1 −→ (6.7) R Cα γ2 [u2−α ]T − 2 T u1−α du γ2 0 0 Thus, from equations (6.5), (6.6) and (6.7) the ratio asymptotically to γ1 γ2 . ˆ 2] IE[X 1 ˆ IE[X22 ] converge ¥ 4 Theorem 6.2. Let x = As be a data vector of an α-stable process mixture and P def T R̂ = N1 N t=1 x(t)x(t) its sample covariance matrix. Then the normalized covariance matrix of x defined by def R̂ = R̂ T race(R̂) (6.8) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 98 Minimum Dispersion Approach converges asymptotically to the finite matrix ADAT , where D is the positive diagonal matrix D = diag(d1 , · · · , dm ) with di = Pm γγij kaj k2 where γi is the dispersion j=1 of the ith source signal and k.k denotes the euclidian norm. Proof We have, clearly, Pm ˆ m X ˆ i (t)]2 IE[si (t)]2 ai aTi R̂ IE[s = Pmi=1 = ai aTi P m ˆ 2 ka k2 ˆ j (t)]2 kaj k2 IE[s IE[s (t)] T race(R̂) j j j=1 j=1 i=1 (6.9) Using theorem 6.1, we see that Pm j=1 ˆ i (t)]2 IE[s γi N →∞ −→ di = Pm 2 2 2 ˆ j (t)] kaj k IE[s j=1 γj kaj k N →∞ Then, from equations (6.9) and (6.10) R̂ −→ Pm T i=1 di ai ai (6.10) = ADAT ¥ Proposition 6.1. Let R̂ be the normalized covariance matrix defined above in (6.8) of the considered α-stable mixture. Then the inverse square root matrix of R̂ is a data whitening matrix. Proof Theorem 6.2 means that the normalized covariance matrix R̂ has the appropriate structure to compute a whitening matrix. Indeed, the whitening matrix can be obtained from the eigendecomposition of −1/2 R̂ = UΣUT as W = Σs UTs where Σs (resp. Us ) corresponds to the diagonal (resp. orthogonal) matrix of the m largest eigenvalues (resp. eigenvectors) of R̂. Then, we can write I = WRWT = 1 1 WADAT WT = (WAD 2 )(WAD 2 )T . Recall that, without loss of 1 generality, A can be replaced by AD 2 (D being a positive diagonal matrix) because of the scaling indeterminacy. We can see that W 1 transforms AD 2 (i.e. the mixing matrix) into an orthogonal matrix. ¥ 6.2.2 Minimum dispersion criterion [A]- Minimum dispersion criterion in signal processing The minimum dispersion (MD) criterion is a common tool in linear theory of stable processes as the dispersion of a stable r.v. plays a role analogous to the variance. For example, the larger the dispersion of a stable distribution, the more it spreads around its median. Hence, the minimum dispersion criterion becomes a natural and mathematically meaningful choice as a measure of optimality in signal processing problems based on stable models. By minimizing the error dispersion, we minimize the average magnitude of estimation errors. Furthermore, it M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.2 Source Separation Procedure 99 has been shown that minimizing the dispersion is also equivalent to minimizing the probability of large estimation errors. Hence, the minimum dispersion criterion is well justified under the stable model assumption. It is a direct generalization of the minimum mean-squared error criterion and relatively simple to calculate [Nikias et Shao(1995)]. Minimizing the dispersion is also equivalent to minimizing the fractional lowerorder moments of estimation errors that measure the Lp distance between an estimate and its true value, for 0 < p < α ≤ 2. This result is not surprising since the Lp norms for p < 2 are well known for their robustness against outliers such as those that may be described by the stable law. It is also known that all the lower-order moments of a stable r.v. are equivalent, i.e. any two of the lower-order moments differ by a fixed constant that is independent of the r.v. itself. A common choice is the L1 norm, which is sometimes very convenient. Stable signal processing based on fractional lower-order moments will inevitably introduce nonlinearity to even linear problems. The basic reason for the non-linearity is that we have to solve linear estimation problems in Banach or metric spaces instead of Hilbert spaces. It is well known that, while the linear space generated by a Gaussian process is a Hilbert space, the linear space of a stable process is a Banach space when 1 ≤ α < 2 and only a metric space when 0 < α < 1 [Cambanis et Miller(1981)]. Banach or metric spaces do not have as nice properties and structures as Hilbert spaces for linear estimation problems. [B]- Minimum dispersion criterion for BSS 4 Let z(t) = Bx(t) where B is unitary, x denotes the whitened data, i.e, x = Wx and B is a separating matrix to be estimated. Let us consider the global MD criterion given by the sum of dispersions of all entries of z, i.e. 4 J(B) = m X γzi (6.11) i=1 where γzi denotes the dispersion of zi (t) the i-th entry of z(t). In this chapter we prove that the MD criterion defines a contrast function in the sense that the global minimization of the objective function given in (6.11) leads to a separating solution. The p-th order moment of an α-stable r.v. and its dispersion are related through only a constant (see property 2.7). Therefore, the MD criterion is equivalent to least lp -norm estimation where 0 < p < α. Although the most widely used contrast functions for BSS are based on the second and fourth-order cumulants [Cichocki et Amari(2002)], we believe however that there are good reasons to extend the class of contrast functions from cumulants to fractional moments, as we argue next. Mutual information (MI) is usually chosen to measure the degree of independence. Because the direct estimation of MI is very difficult, one can then derive approximative contrast functions, often based on cumulant expansions of the densities. However, one can approximate the Shannon entropy (that is closely related to the MI) using the lp -norm concept ([Karvanen et Cichocki(2003)]) and hence use it to approximate the MI. For example, in [Hyvarinen(1999)] the author uses the lp -norm concept to approximate the MI and then to find the optimal M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 100 Minimum Dispersion Approach contrast function for exponential power family of density fp (x) = k1 exp(k2 |x|p ). Thus we propose the MD criterion or equivalently the FLOM based criterion (2.7) for measuring independence of alpha-stable distributed data. We should also note here that the lp -norm is commonly used as a measure of sparseness of signals [Karvanen et Cichocki(2003)]. This leads to the use of MD criterion as a measure of sparseness which has been demonstrated to be a powerful concept in BSS [Cichocki et Amari(2002)], [Karvanen et Cichocki(2003)]. Consequently, the MD criterion can be used as a cost function to achieve the BSS as shown by the following result. Theorem 6.3. The minimum dispersion criterion def J(B) = m X γ zi (6.12) i=1 is a contrast function under orthogonality constraint for separating an instantaneous mixture of alpha-stable sources. Proof Note that z(t) is an orthogonal mixture of the sources and can be def written as z(t) = Cs(t) with C = BWA orthogonal. Here we prove that MD criterion J(B) reaches its minimum value in the set of orthogonal matrices if and only if BW (W is the whitening matrix) is a separating matrix or equivalently if and only if C is a generalized permutation matrix (i.e. a permutation matrix times a non-singular diagonal matrix). Indeed, using properties 1 and 2 of SαS processes presented in section I, one can write : m m X X (6.13) J(B) = | Cij |α γsj i=1 = = à j=1 m m X X j=1 m X ! | Cij |α γsj (6.14) i=1 aj γsj (6.15) j=1 4 Pm α with aj = i=1 | Cij | and Cij being the (i, j)-th entry of C. Now since aj and γsj are positive, minimizing J(B) is equivalent to minimize all aj coefficients. Let us prove that coefficients aj satisfy aj ≥ 1 ∀ j, and aj = 1 if and only if C is a generalized permutation matrix. Since C is unitary (which implies |≤P1) and α < 2 we P that | Cji m α≥ 2 have | Cij |α ≥| Cij |2 . Therefore aj = m | C | ij i=1 i=1 | Cij | = 1. The equality holds if and only if ∀i | Cij |α =| Cij |2 or equivalently if Cij = 0 or |Cij | = 1. C being unitary, the latter is satisfied if and only if ∀j ∃ ij such that | Cij j |= 1 and Cij = 0 ∀ i 6= ij . ¥ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.2 Source Separation Procedure 101 The proposed method requires no or little a priori knowledge of the input signals. The dispersion as well as the characteristic exponent α are estimated according to [Tsihrintzis et Nikias(1996)] where the proposed estimator is proved to be consistent and asymptotically normal. This estimator is based on the theory of fractional lower order moments of the SαS distributions. 6.2.3 Separation algorithm : Jacobi implementation Theorem 6.3 proves that under orthogonal transform the signal has minimum dispersion if its entries are mutually independent. The problem now is to minimize a cost function under orthogonal constraint. Different approaches exist to perform this constraint optimization problem. We chose here to estimate B as a product of Givens rotations according to B= Y Y Ωpq (θ) (6.16) #sweeps 1≤p<q≤m Were Ωpq (θ) is the elementary Givens rotation defined as orthogonal matrix where all diagonal elements are 1 except for the two elements c = cos(θ) in rows (and columns) p and q. Likewise, all off-diagonal elements of Ωpq (θ) are 0 except for the two elements s = sin(θ) and −s at positions (p, q) and (q, p), respectively. The minimization of J(Ωpq (θ)) is done numerically by searching θ using a fine grid into [0, π/2] 1 The so called MD algorithm can be summarized as follows in Table 6.2.3. ' $ Minimum Dispersion Algorithm Step 1. Whitening transform Step 2. Sweep. For all pairs 1 ≤ p < q ≤ m, do – Compute the Givens angle 0 ≤ θ̂pq < π/2 that maximize the pairwise independence for zp and zq by minimizing the global dispersion J(Ωpq (θ)). – If θ̂pq > θ̂min a , rotate the pair accordingly. – If no pair has been rotated in the previous sweep, endb . Otherwise perform another sweep. a The constant θ̂min is a threshold value that defines the minimum rotation angle that is significant in estimating B. b In our simulation, we used an angle grid resolution of π/100. This value is the same one that used for the threshold value & % Tab. 6.1 – The principal steps of the proposed minimum dispersion (MD) algorithm. 1 Here, we consider [0, π/2] instead [0, π] because Ωpq (θ + π/2) is equal to Ωpq (θ) up to a generalized permutation matrix. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 102 6.3 Minimum Dispersion Approach Performance Evaluation & Comparison This section examines the statistical performances of our MD-based separation procedure. The numerical results presented below have been obtained in the following setting. The source signals are i.i.d. impulsive standard SαS (µ = 0 and γ = 1). The number of sources is m = 3 and the number of observations is n = 3. The statistics are evaluated over 100 Monte Carlo runs and the mixing matrix as well as the sources are generated randomly at each run. The performance of our MD method is compared to three widely used BSS algorithms ; JADE [Cardoso et Souloumiac(1993)], EASI [Cardoso et Lahed(1996)] and RQML [Shereshevski et al.(2001)]. To measure the quality of source separation, we did use the generalized rejection level criterion defined below. 6.3.1 Generalized rejection level index To evaluate the performance of the separation method, we propose to define the rejection level Iperf as the mean value of the interference signal dispersion over the desired signal dispersion. This criterion generalizes the existing one [Cichocki et Amari(2002)] based on signal powers2 which represents the mean value of interference to signal ratio. If source k is the desired signal, the related generalized rejection level would be : P P α def γ( l6=k Ckl sl ) l6=k |Ckl | γl = (6.17) Ik = γ(Ckk sk ) |Ckk |α γk where γ(x) denotes the dispersion of a SαS RV x. Therefore, the averaged rejection level is given by m m 1 X 1 X X | Cij |α γj Iperf = Ii = . m m | Cii |α γi i=1 6.3.2 (6.18) i=1 j6=i Experimental results • First experiment In Figure 6.1, we present an example of separation of highly impulsive sources (α = 0.5) mixed by a random 3 × 3 matrix A. It appears that the proposed algorithm achieves very good separation quality. • Second experiment In Figure 6.2, the mean rejection level of the MD algorithm versus the characteristic exponent is plotted. The sample size is set to N = 1000. It appears that the parameter α is of crucial importance as it has a major influence on the separation performance. Two important features are observed : the mean rejection 2 For SαS processes the variance (power) is replaced by the dispersion. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.3 Performance Evaluation & Comparison 6 5 x 10 7 Source1 3 5 2 0 1 −5 0 −10 −5 −1 Amplitude Amplitude 6 Source2 0 0 5000 Mixture1 10000 40 20 0 5000 Mixture2 10000 −15 400 50 200 0 0 −50 x 10 0 Source3 5000 Mixture3 10000 0 −20 −40 0 50 Amplitude x 10 103 −200 −100 5000 10000 0 5000 10000 0 First estimated signal Second estimated signal 100 50 0 50 0 −50 0 −50 −100 0 5000 Time 10000 −50 0 5000 Time 5000 10000 Third estimated signal −100 10000 0 5000 Time 10000 Fig. 6.1: Extraction of 3 α-stable sources from 3 observations where α = 0.5, and N = 10000. level increases when the sources are very impulsive (α close to zero) or when they are close to the Gaussian case (α close to two). In the latter case (i.e. α = 2), the source separation is not possible. 5 EASI Iperf: Generalized mean rejection level in dB 0 −5 −10 JADE RQML −15 −20 MD −25 −30 0 0.2 0.4 0.6 0.8 1 1.2 1.4 α: Characteristic exponent of the sources 1.6 1.8 2 Fig. 6.2: Generalized mean rejection level versus α where N = 1000. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 104 Minimum Dispersion Approach • Third experiment In Figure 6.3, the simulation study shows that estimation errors of the characteristic exponent α of sources distribution have little influence on the performances of the algorithm. 10 Iperf Generalized mean rejection level in dB 0 −10 −20 −30 −40 −50 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 ∆α: Error of characteristic exponent estimation 0.15 0.2 Fig. 6.3: Generalized mean rejection level versus the estimation error ∆α. • Fourth experiment In Figure 6.4, for our proposed MD algorithm, two different scenarios lead to similar performances. In the first scenario, we consider a mixture of three αstable sources with same characteristic exponent α = 1.5 and in the second one, we assume wrongly three SαS sources with α = 1.5 while, in reality, the sources are SαS with different characteristic exponents α1 = 1.5, α2 = 1 (Cauchy pdf) and α3 = 2 (Gaussian pdf). It can be observed that the algorithm can separate sources from their mixtures even though we deviate from the assumptions under which it is derived. Consequently, the MD algorithm is robust to possible sources modelization errors. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.3 Performance Evaluation & Comparison 105 −10 MD: sources with same characteristic exponent α=1.5 MD: sources with different characteristic exponent α1=1, α2=1.5 and α3=2 −12 Iperf: Generalized mean rejection level −14 −16 −18 −20 −22 −24 −26 −28 −30 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 N: sample size Fig. 6.4: Generalized mean rejection level versus sample size N. • Fifth experiment Figure 6.5 shows the performance obtained by each of the four BSS algorithms as a function of the sample size N in the case of α = 1.5. One can observe that good performances are reached by the MD algorithm for relatively small/medium sample sizes. This figure demonstrates also that EASI fails to separate α-stable signals and that JADE is sub-optimal in this context. This is due to the fact that EASI and JADE are not specifically designed for heavy-tailed signals. As a comparison between MD and RQML, we can observe a certain performance gain in favor of the MD algorithm. This is due to the fact that truncating observations, in RQML procedure, created by large source signal values is not optimal because these observations must be very informative. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 106 Minimum Dispersion Approach 5 EASI Iperf: Generalized mean rejection level 0 −5 JADE −10 RQLM −15 MD −20 −25 −30 0 500 1000 1500 2000 2500 3000 N: sample size 3500 4000 4500 5000 Fig. 6.5: Generalized mean rejection level versus sample size for α = 1.5. • Sixth experiment In the sixth experiment, we consider the case where the observation is corrupted by an additive white Gaussian noise. The mean rejection level versus noise power is depicted in Figure 6 for α = 1.5 and N = 1000. In this experiment, the noise level σ 2 is varied between 0 dB and -30 dB. As can be seen, the performances degrade significantly when the noise power is high. This might be explained by the fact that the theory does not take into consideration additive noise. Improving robustness against noise is still an open problem under investigation. It can be seen from Figure 6.6, however, that the proposed MD method has reliable performance and outperforms RQML algorithm in the low or moderate noise power situation. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 6.4 Concluding Remarks 107 0 EASI Iperf Generalized mean rejection level in dB −5 −10 JADE −15 RQML −20 MD −25 −30 −25 −20 −15 −10 −5 0 noise power in dB Fig. 6.6: Generalized mean rejection level versus the additive noise power for α = 1.5. 6.4 Concluding Remarks We have introduced a two step procedure for α-stable source separation. A first generalized whitening step allows to orthogonalize the mixing matrix using a normalized covariance matrix of the observation. In the second step, the remaining orthogonal matrix is estimated by minimizing a global dispersion criterion. The proposed method is robust to modelization errors of the source pdf. Numerical examples are presented to illustrate the effectiveness of the proposed method that is shown to perform better than the RQLM method. Moreover, they confirm that existing BSS methods, which are not designed specifically to handle impulsive signals, fail to provide good separation quality. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 108 Minimum Dispersion Approach M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 109 Chapitre 7 Sub- and Super- Additivity based Contrast Functions In this chapter, we introduce a generalization of our previous contribution. Indeed, we provide a systematic method to construct contrast functions through the use of sub- and super-additive functionals1 . Some practical examples of useful contrast functions are introduced and discussed. 7.1 BSS Using Contrast Functions In this chapter, we consider the mixture model given by x = As where A is an unknown n × m mixing matrix, x denotes the observation vector and s represents the source vector. The separation problem consist of finding a separating matrix B such that the components of y = Bx are independent. Note that belong this chapter we consider BSS under orthogonality constraint (assuming implicitly that a whitening step has already been performed). Thus, in the rest of this chapter we suppose that B is an orthogonal matrix. 7.2 On contrast functions The concept of contrast function for source separation has been first presented by [Comon(1994)]. A contrast function for source separation is a real valued 1 These arguments follow the same procedure as in [Sahmoudi et al.(2005)]. Furthermore, inspired from the proof of the fact that the minimum dispersion is a contrast function, this chapter is a direct generalization of the previous one M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 110 Sub- and Super- Additivity based Contrast Functions function of the distribution of a random vector which is minimized (or maximized) when the source separation is achieved. To characterize mathematically a contrast function we use the following definition Définition 7.1. A functional F is a contrast function if and only if it satisfies the two requirements : R1. F (Cs) ≥ F(s) for any independent random vector s and any invertible matrix C. R2. The equality F (Cs) = F(s) holds if and only if C = PD, where P and D are a permutation and a diagonal matrix, respectively. Thus, we define the contrast function as in [Comon(1994)] as functional of the distribution of Bx or also of B which attains its minimum (or maximum) when separation is achieved. Intuitively, a measure of dependence between the components of Bx would be a contrast, but there may be other. It should be noted that the construction of a contrast function is only a first step toward a separation procedure. The contrasts proposed here are theoretical functionals because they depend on the distribution of the reconstructed sources, which is unknown. To obtain a usable contrast, this distribution or in fact certain functionals of it must be estimated from the data. We will not consider here this problem as well as that of constructing a good algorithm for minimizing the resulting empirical contrast. It should be however pointed out that the ease of the estimation and of the minimizing algorithm, both in term of implementation and computational cost, should be taken into account, besides performance considerations, in assessing the final separation method. As existing examples of contrast functions, it has been shown in [Comon(1994)] and [Cardoso(1998)] that the sum of the 4-th order cross-cumulants of the components is a contrast function. Other contrast functions can be found in [Moreau et Macchi(1996)], [Moreau et Pesquet(1997)], [Moreau et Stoll(1999)], [Cardoso(1999)], [Pham(2000)], [Adib et al.(2002)]. Note that the ideas of this chapter are inspired from the projection pursuit methodology described in [Huber(1985)]. In this paper, Huber used sub- and superadditive functionals (under other additional assumptions) to define a test statistics for normality. Similarly, we use these classes of functionals to define some index of non-gaussianity. Thus minimizing the proposed criterions may be viewed as maximizing the non-gaussianity of the observations. Remark 7.1. Some heuristic arguments that sub- additivity functionals go together with non-gaussianity measure, are as follows : – The cumulants which are widely used as measure of non-gaussianity are additive (sub- and super- additive) functionals. – The exponential Shannon entropy defined by ½ Z ¾ H(x) = exp − log(f )f dx (7.1) where f is the PDF of x, which is commonly used for non-gaussianity measure is super-additive. For a proof, see [Blachman(1965)]. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 7.3 Orthogonality constraint 111 Définition 7.2. A functional F of the distribution of a random variable X, denoted by F(X), is said to be scale equivariant if F(aX) = |a|F(X) (7.2) for any real number a. Note that if F is scale equivariant, then |F| is also scale equivariant. Hence, we can without loss of generality assume in this work F ≥ 0. 7.3 Orthogonality constraint Principal Component Analysis (PCA) or whitening consists of transforming the observation vector to decorrelated outputs. However, it is well known that the PCA job is not sufficient for separating the sources. To see this, consider a square BSS model (i.e. n = m). For estimating the n × n matrix A, taking into account n scale ambiguities, we must determine n(n − 1) unknown coefficients. The second-order decorrelation constraints give n(n − 1)/2 equations, which is not sufficient for determining A. Thus we get a proof that the Gaussian sources cannot be separated : they are characterized justly by first and second order statistics. It is interesting to note that the second-order independence (whitening), solves the BSS problem up to an orthogonal transformation. To see this, consider the factorization B = UW of the separating matrix, where w is the spatial whitening matrix of the observations and U is an orthogonal transformation. In other words, for z = Wx, we suppose that IE{zzT } = I without loss of generality. Now, since the output are independent, from IE{yyT } = UIE{zzT }UT = I, (7.3) we deduce that UU = I is the second half job of the BSS problem. Thus one can say that whitening solves half of the problem of BSS. Because whitening is a very simple and standard procedure, much simpler than any BSS algorithms, it is a good idea to reduce the complexity of the problem this way. The remaining half of the parameters has to be estimated by some other method. This fact shows that for finding the other required equations, other information must be used such as HOS and FLOS. Even in cases where whitening is not explicitly required, it is recommended, since it reduces the number of free parameters and considerably increases the performance of the methods, especially with high-dimensional data. 7.4 Sub-Additivity based Contrast Functions Définition 7.3. A functional F of the distribution of a random variable X, denoted by F(X), is said to be sub-additive if F(X + Y ) ≤ F(X) + F(Y ) (7.4) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 112 Sub- and Super- Additivity based Contrast Functions for any two independent random variables X and Y . Définition 7.4. A functional F of the distribution of a random variable X, denoted by F(X), is said to be σ-sub-additive if F σ (X + Y ) ≤ F σ (X) + F σ (Y ) (7.5) for any two independent random variables X and Y . Theorem 7.1. Let suppose that F is a σ-sub-additive and scale equivariant functional and that the mixing matrix A is orthogonal. Let σ be a real number such that σ ≥ 2. Then, the following objective function C(B) = − n X F σ (yi ) where y = Bx (7.6) i=1 is a contrast function for blind separation of linear instantaneous mixtures under the orthogonality constraint of the mixing matrix B. Proof def Let write C = BA. Then, letting Cij be the general element of C and sj be the components of s, one has yi = m X Cij sj . (7.7) j=1 Hence, using the equivariant and the sub-additivity properties of F we have σ m X X Fσ Cij sj ≤ |Cij |F(sj ) . (7.8) j j=1 ³P m ´ ³P ´ P m P F(sj ) Write m |C |F(s ) = F(s ) |C | m j ij j ij , and j=1 j=1 j=1 j=1 F(sj ) σ use the convexity property of the function x 7→ x for a real number σ ≥ 2, then we have : X Cij sj F σ (yi ) = F σ (7.9) j σ m m X X F(s ) Pm j |Cij |σ ≤ F(sj ) F(sj ) j=1 j=1 j=1 σ−1 m m X X F(sj ) |Cij |σ F(sj ) ≤ j=1 ≤ Υ m X (7.10) (7.11) j=1 |Cij |2 F(sj ). (7.12) j=1 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 7.4 Sub-Additivity based Contrast Functions where Υ is the constant quantity ³P m ´σ−1 Summing the P above inequalities and use the orthogonality constraint |Cij |2 ≤ ni=1 |Cij |2 = 1, one gets Ãm ! n n m X X X X σ 2 F (yi ) ≤ Υ |Cij | F(sj ) = Υ F(sj ) (7.13) i=1 j=1 j=1 F(sj ) 113 i=1 . j=1 Clearly the equality is attained P if C is a generalized2 permutation matrix. This proves that C(B) = ni=1 F σ (yi ) is a contrast function. ¥ Thus one needs only to find a sub-additive scale equivariant functional F. 7.4.1 Lp -norm contrast functions ; p ≥ 1 R def Let F(Y ) = kY kp = (IE|Y |p )1/p = ( |y|p fy (y)dy)1/p , the Lp norm of the random variable Y , where fy (.) denote the density function of Y . Note that by the second and third axioms in a norm definition, F(Y ) = kY kp is sub-additive and scale equivariant. Thus, the Lp -norm criterion Cp (B) = n X kyi kσp with σ ≥ 2 (7.14) i=1 is a contrast function that can separate sub- and super- Gaussian sources. Indeed, it is worth to emphasis that the existing of fractional lower-order moments imply that the Lp -norm contrast function can separate heavy-tailed α-stable signals by choosing 1 ≤ p < α. For examples one can choose σ = 2p. 7.4.2 Alpha-stable scale contrast function Let us consider a mixture of α-stable sources with the same characteristic exponent α and dispersion γ. The scale parameter of an alpha-stable distribution 1 1 1 is defined by S = γ α . We recall that S(aY ) = γ α (aY ) = aγ α (Y ) = aS(Y ) . def 1 Then, the α-stable scale functional F(Y ) = γ α is equivariant. Note that the scale functional is also sub-additive. To prove this, let us consider two independent r.v X and Y , then we have 1 S(X + Y ) = (γX+Y ) α = (γX + γY ) 1 α (7.15) 1 α (7.16) 1 α ≤ (γX ) + (γY ) . (7.17) The last inequality result for α ≥ 1 follows the fact that for non negative numbers u, v, r, with r ≤ 1 one has (u + v)r ≤ ur + v r , because Z u+v Z v Z v (u + v)r − ur = rtr−1 dt = r(t + u)r−1 dt ≤ rtr−1 dt = v r u 0 0 2 We mean by generalized permutation matrix any matrix DP, where P is a permutation matrix and D is a diagonal matrix. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 114 Sub- and Super- Additivity based Contrast Functions From theorem 7.1 the sum of scale of all output BSS model C(B) = n X Syσi with σ ≥ 2 (7.18) i=1 defines a contrast function. Thus, this is another contrast function that can separate linear alpha-stable mixtures. In the algorithm derivation step one can estimate Syi using one of existing method to estimate the dispersion γyi . Remark 7.2. The Lp -norm contrast function can separate a mixture of heavy tailed and other non-heavy tailed sources. This robustness property follows from the fact that the fractional lower-order statistics needed for empirical computation are always defined for any r.v. with distribution not necessary alpha-stable unlike the alpha-stable scale contrast function it is restricted to sources with alpha-stable distributions. 7.5 Super-Additivity based Contrast Functions Définition 7.5. A functional G of the distribution of a random variable X, denoted by G(X), is said to be super-additive if G(X + Y ) ≥ G(X) + G(Y ) (7.19) for any two independent random variables X and Y . Définition 7.6. A functional G of the distribution of a random variable X, denoted by G(X), is said to be σ-super-additive if G σ (X + Y ) ≥ G σ (X) + G σ (Y ) (7.20) for any two independent random variables X and Y . Theorem 7.2. Suppose that G is a σ-super-additive and scale equivariant functional, that the mixing matrix A is orthogonal and that σ is a real number such that σ < 2. Then, the following objective functions C(B) = − n X G σ (yi ) where y = Bx (7.21) i=1 are contrast functions for blind separation of linear instantaneous mixtures under the orthogonality constraint of the mixing matrix B. Proof – To prove the first contrast function requirement, remember that with the same notations as in the proof of theorem 7.1, one has yi = M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 7.5 Super-Additivity based Contrast Functions 115 Pm j=1 Cij sj . Using the σ-super-additivity and the scale equivariant properties of the functional G, we have σ −G (yi ) ≤ − ≤ − m X j=1 m X G σ (Cij sj ) (7.22) |Cij |σ G σ (sj ) (7.23) j=1 (7.24) Summing this quantity for all output components P and using the fact that |Cij |σ ≥ |Cij |2 since σ < 2 and |Cij |2 ≤ ni=1 |Cij |2 = 1 du to the orthogonality constraint, one gets − n X σ G (yi ) ≤ − i=1 n X m X |Cij |σ G σ (sj ) i=1 j=1 ≤ − = − à n m X X j=1 m X (7.25) ! 2 |Cij | G σ (sj ) (7.26) i=1 G σ (sj ) (7.27) j=1 So C(y) = C(Cs) ≤ C(s) (7.28) Thus, the requirement R1 is fulfilled. – Finally, C(Cs) = (s) or equivalently, C(y) = − n X i=1 m n X X Gσ Cij sj = − G σ (sj ) j=1 (7.29) i=1 Pm requires that ∀i, j=1 Cij sj = sj . It implies that C:j has exactly one nonzero component Cij j = ±1. Since C is orthogonal, it means that C = DP, where D denotes a diagonal matrix with entries ±1, and P the permutation matrix associated to the permutation i(1), · · · , i(n). Clearly the equality in (7.28) is P attained if C is a permutation matrix. This proves that C(B) = − ni=1 G σ (yi ) is a contrast function. ¥ 7.5.1 Dispersion contrast function Here, we consider a linear mixture of α-stable signals with the same characte1 ristic exponent α, dispersion γ and scale functional S = γ α considered above. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 116 Sub- and Super- Additivity based Contrast Functions We verified above that S is scale equivariant. It is also easy to see that S is αadditive ; α-sub- and α-super- additive : ³ 1 ´α S α (X + Y ) = γ α (X + Y ) = γ(X) + γ(Y ) = S α (X) + S α (Y ) (7.30) Then, from theorem 7.2 the objective function C(B) = n X S α (yi ) = i=1 n X γyi (7.31) i=1 is a contrast function for alpha-stable source separation. Thus we give another proof that the global minimum dispersion criterion, which we introduced in [Sahmoudi et al.(2003a)] is a contrast function. 7.6 Jacobi-Gradient Algorithm for Prewhitened BSS As presented in the previous chapter, every orthogonal matrix can be parameterized in terms of Givens rotation angles, each of which defines a rotation in a single plane of the high-dimensional vector space. Then, these individual rotations can be cascaded to span the whole set of rotation matrices. Every rotation matrix has a unique set of Givens rotation angles that characterize it. In n-dimensions, a Givens rotation matrix in the plane formed by the i-th and j-th axes is denoted by Ωij , and is given as was presented in [Sahmoudi et al.(2005)]. A rotation matrix is then formed from these sparse matrices according to B= m−1 Y m Y Ωpq (7.32) p=1 q=p+1 The multiplication order can be always from the left or from the right. It is not crucial to the generality of this formula as long as we maintain the same order when taking the derivative of the matrix with respect to a rotation angle. [A]- Optimization & algorithm Our aim is to solve the previously mentioned constrained optimization problem, which becomes unconstrained if Givens angles are used : Let θkl , k = 1, · · · , m − 1, l = 1, · · · , m be the Givens rotation angles that form up our parameter vector Θ. To derive a nice and fast algorithm, we propose here to combine the Jacobi-like decomposition of Givens rotations and the Gradient algorithm using a numerical computation for searching θ. The so called Jacobi-Gradient algorithm can be summarized as follows : M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 7.7 Concluding Remarks 117 $ ' Jacobi-Gradient Algorithm Step 1. Initialize Givens angles randomly. Step 2. Estimate robustly from data, especially if the sources are in noisy environment, the considered sources statistics used in the contrast function. Step 3. Calculate the gradient of the cost function with respect to the Givens angle. The gradient is ∂C(B) ∂θkl Step 4. Update the Givens angles using gradient ascent θ(k + 1) = θ(k) + η ∂C(B) ∂θ Step 5. Go back to step 3 and continue until convergence. & % [B]- Complexity A key concern in many adaptive algorithms is the computational complexity. It is clear that if the multiplications in (7.32) are performed from left, the first output is only affected from the Givens angles with indices θ1q , q = 2, · · · , m, the second is affected by all the angles θ1q , q = 1, · · · , m and θ2q , q = 3, · · · , m and so on. Thus, if we wish to extract the m source component, we only need to adapt the angles θij , i = 1, · · · , m, j = i + 1, · · · , m which makes a total of m2 − m(m + 1)/2 parameters, that is less than m2 parameters required in many Jacobi like algorithms. But then, we will have to evaluate either the sin and cos of all these parameters once. In addition, the necessary matrix vector multiplications in the algorithm will be performed at each iteration, which amount O(m2 ). 7.7 Concluding Remarks In this chapter some robust contrast functions have been introduced. A practical contrast function was derived for application to heavy-tailed sources. Coupling Jacobi and Gradient optimization techniques, a nice implementation was proposed for prewhitened BSS methods. We note that this work was developed recently and due to time limitations we can not present any experimental result. We note that we plan to extend this work using some result in robust statistics of sub- and super- additive functionals of heavy-tailed random variables [Huber(1981)]. Performance analysis also will be investigated. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 118 Sub- and Super- Additivity based Contrast Functions M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 8 Normalized HOS-based Approaches This chapter introduces a new approach for the blind separation (BS) of heavy tailed signals that can be modeled by real-valued symmetric α-stable (SαS) processes. As the second and higher order moments of the latter are infinite, we propose to use normalized statistics of the observation to achieve the BS of the sources. More precisely, we show that the considered normalized statistics are convergent (i.e., take finite values) and have the appropriate structure that allows for the use of standard tensorial BS as well as non-linear decorrelation techniques based on second and higher order cumulants. 8.1 Introduction By the generalized central limit theorem, the α-stable laws are the only class of distributions that can be the limiting distribution for sums of i.i.d. random variables [Samorodnitsky et Taqqu(1994)]. Therefore, many signals are impulsive in nature or after certain pre-processing, e.g. using for example wavelet transform, and can be modeled as stable processes [Nikias et Shao(1995)], [Cappé et al.(2002)]. Unlike most statistical models, the α-stable distributions except the Gaussian have infinite second and higher order moments. Consequently, standard blind source separation (BSS) methods would be inadequate in this case as most of them are based on second or higher order statistics [Cichocki et Amari(2002)]. In this chapter, we propose a new approach for the BS of heavy tailed sources using normalized statistics (NS). It is first shown that suitably normalized secondand fourth-order cumulants exist and have the appropriate structure for the BSS. This is a similar result to those of [Swami et Sadler(1998)] in the ARMA stable context. Then, for extracting α-stable source signals from their observed mixtures 120 Normalized HOS-based Approaches one can use any standard procedure based on second- or forth-order cumulants. This BSS method has several advantages over the existing ones that are discussed in the sequel. Simulation-based comparisons with the minimum dispersion (MD) criterion based method in [Sahmoudi et al.(2005)] are also provided. 8.2 Normalized Statistics of Heavy-Tailed Mixtures 8.2.1 Normalized moments Thanks to the algebraic tail-behavior, we demonstrate here that the ratio of the k-th moments of two random SαS variables with α 6= 2 converges to a finite value (even though the moments themselves are infinite). More precisely, we have the following theorem : Theorem 8.1. Let X1 and X2 be two SαS variables of dispersions γ1 and γ2 and PDFs f1 (.) and f2 (.), respectively. Then, for k ≥ α, we have RT k IE(|X1 |k ) 4 γ1 −T |x| f1 (x)dx = lim = R T k k T →∞ γ2 IE(|X2 | ) −T |u| f2 (u)du (8.1) proof Let Rk represents the above ratio, then due to the symmetric PDF of X1 and X2 , we have RT RT k k x f1 (x)dx 4 −T |x| f1 (x)dx Rk = R T = R0T (8.2) k k −T |u| f2 (u)du 0 u f2 (u)du Using integration by parts, we get Rk = [−xk (1 − Φ1 (x))]T0 + k [−uk (1 − Φ2 (u))]T0 + RT R0T k 0 xk−1 (1 − Φ1 (x))dx uk−1 (1 − Φ2 (u))du (8.3) where Φ(.) denotes the cumulative function of the considered PDF. From the heavy tails property (see chapter 2), we can observe that for any SαS cumulative function Φ, we have (1 − Φ(x)) ∼ C2α γx−α as x → ∞. Then, as T → ∞, Rk is equivalent to : RT Cα γ1 [−xk−α ]T0 + k 0 xk−1−α dx γ1 Rk ∼ → R T Cα γ2 [−uk−α ]T + k γ2 uk−1−α du 0 0 ¥ Using a similar proof, one can demonstrate that the ratio of the square of the k-th moment to the 2k-th moment of a random SαS variable (α 6= 2) converges to zero for k > α. More precisely, we have the following theorem : M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8.3 Normalized Tensorial BSS Methods 121 Theorem 8.2. Let X be a SαS variable of dispersion γ and PDF f (.). Then, for k > α, we have : RT ( −T |x|k f (x)dx)2 (IE|X|k )2 4 = lim R T =0 (8.4) 2k T →∞ IE|X|2k −T |x| f (x)dx 8.2.2 Normalized second and fourth order cumulants Using above results, we can establish now that the normalized covariance matrix of the mixture signal converges to a finite valued matrix with the desired algebraic structure. We have the following result : Theorem 8.3. Let x be an SαS vector given by x = As (s being a vector of SαS independent random variables). Then the normalized covariance matrix of x satisfies : m X Cum[x(i), x(j)] 4 dk ak (i)ak (j) = R(i, j) = Pn k=1 Cum[x(k), x(k)] k=1 or equivalently : R = ADAT where D = diag(d1 , · · · , dm ) and γi 2 γ j=1 j || aj || di = Pm aj being the j-th column vector of A. Similarly, the normalized quadri-covariance tensor [Cardoso(1991)] of the mixture signal converges to a finite valued tensor with the desired algebraic structure. We have the following result : Theorem 8.4. Let x be an SαS vector given by x = As (s being a vector of SαS independent random variables). Then the normalized quadri-covariance tensor of x satisfies : Q(i, j, k, l) 4 = = Cum[x(i), x(j), x(k), x(l)] Pn r=1 Cum[x(r), x(r), x(r), x(r)] m X κr ar (i)ar (j)ar (k)ar (l) r=1 where γi 4 j=1 γj || aj || κi = Pm 8.3 8.3.1 Normalized Tensorial BSS Methods Separation algorithms Thanks to theorems 8.3 and 8.4, we can now use existing BSS methods based on 2nd and 4th order cumulants, e.g. [Comon(1994)] for the ICA algorithm and M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 122 Normalized HOS-based Approaches [Cardoso et Souloumiac(1993)] for the JADE algorithm. In this work, we have applied JADE to the normalized 2nd and 4th order cumulants of the observations. The so called Robust-JADE1 algorithm can be summarized as follows in Table 8.1. ' $ Robust-JADE Algorithm Step 1. Compute a whitening matrix Ŵ from the normalized sample covariance R̂x (that is estimated as the standard sample covariance matrix divided by its trace value). Step 2. Compute the most significant eigenpairs a {λ̂r , M̂r ; 1 ≤ r ≤ m} from the normalized sample 4th-order cumulants of the whitened process 4 z(t) = Ŵx(t). Step 3. Diagonalize jointly the set λ̂r M̂r ; 1 ≤ r ≤ m by a unitary matrix Û. a See [Cardoso et Souloumiac(1993)] for more details about the JADE algorithm & Tab. 8.1 – The principal steps of the proposed Robust-JADE algorithm. We provide here some remarks about the above separation method and discuss certain advantages of the use of normalized statistics. – Based on theorem 8.2, the normalized 4-th order cumulants are equal to the normalized 4-th order moments of the SαS source mixture (recall here that for a real valued zero-mean random variable x, we have cum(x, x, x, x) = IE(x4 ) − 3(IE(x2 ))2 ). In other words, for SαS sources, one can replace the 4-th order cumulants by the 4-th order moments of the mixture signal. – One major advantage of the proposed method compared to the FLOM based methods is that no a priori knowledge or pre-estimation of source PDF parameters (in particular, the characteristic exponent α) is required. Consequently, the normalized-statistics based method is robust to modelization errors with respect to the source PDF. – In the case where the sources are non-impulsive, the proposed method coincides with the standard one (in our case, with the JADE method). Indeed, because of the scaling indeterminacy, the normalization would have no effect in this case. – Another advantage of the NS-based method is that it can easily be extended to the case where the sources are of different types : i.e., sources with different characteristic exponents or non-impulsive sources in presence of other impulsive ones. That can be done for example by using the above NSbased method in conjunction with a deflation technique [Adib et al.(2002)]. 1 In the fact Robust-JADE the same algorithm JADE up to some multiplicative constants which has any effect of BSS. We refer the resulting algorithm by Robust-JADE to express it’s validity for heavy-tailed sources. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires % 8.3 Normalized Tensorial BSS Methods 123 Indeed, in that case, one can prove that the normalized statistics coincide with those of the mixture of the ‘most impulsive’ sources only (i.e the ones with the smallest characteristic exponent) which can be estimated first then removed (by deflation) to allow the estimation and separation of the other sources. This point is still under investigation and will be presented in details in future work. – In this paper, we have established only the convergence of the ‘exact’ normalized statistics (expressed by the mathematical expectation). In fact, one can prove along the same lines of [Swami et Sadler(1998)] that the sample estimates of the second and fourth order cumulants converge in probability to the exact normalized statistics given by theorems 8.3 and 8.4. 8.3.2 Performance evaluation & comparison This section examines the statistical performances of the separation procedure. The numerical results presented below have been obtained in the following setting. The source signals are i.i.d. impulsive symmetric standard α-stable (β = 0, µ = 0 and γ = 1). The number of sources is m = 3 and the number of observations is n = 4. The statistics are evaluated over 100 Monte-Carlo runs and the mixing matrix is generated randomly at each run. • Performance index To measure the quality of source separation, we did use the generalized rejection level criterion defined as follows : If source k is the desired signal, the related generalized rejection level would be : P P α def γ( l6=k Ckl sl ) l6=k |Ckl | γl = (8.5) Ik = γ(Ckk sk ) |Ckk |α γk where γ(x) (resp. γl ) denotes the dispersion of an SαS random variable x (resp. def source sl ) and C = Â# A . Therefore, the averaged rejection level is given by m Iperf = m 1 X X | Cij |α γj 1 X Ii = . m m | Cii |α γi i=1 i=1 j6=i The performances of the NS-based method (referred to as Robust-JADE) are compared with those of the MD method introduced in [Sahmoudi et al.(2005)]. • First experiment Figure 8.1, present the generalized mean rejection level versus the additive Gaussian noise power (N = 1000 and α = 1.5). We can observe also a certain performance gain in favor of the minimum dispersion (MD) algorithm. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 124 Normalized HOS-based Approaches 0 Mean rejection level −5 −10 Robust−JADE −15 −20 MD −25 −30 −25 −20 −15 Noise power in dB −10 −5 0 Fig. 8.1: Generalized mean rejection level versus the noise power. • Second experiment Figure 8.2, present the generalized mean rejection level versus the sample size (α = 1.5 and the mixture is noise-free). We can observe a certain performance gain in favor of the Robust-JADE algorithm. −10 −12 Mean rejection level in dB −14 −16 −18 −20 −22 −24 MD Robust−JADE −26 −28 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 N: sample size Fig. 8.2: Generalized mean rejection level versus the sample size. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8.4 Normalized Non-linear Decorrelation BSS Methods 8.4 125 Normalized Non-linear Decorrelation BSS Methods In this chapter, we focus on the use of normalized statistics (NS) of heavytailed sources for the BSS problem. In [Sahmoudi et al.(2004a)], the NS have been introduced for alpha-stable sources to justify for the use of algebraic-based separation algorithms (JADE, SOBI,..etc) to achieve the BSS in the heavy-tailed case. In this work, we propose to use the NS to robustify the class of Non-linear decorrelation algorithms like as the EASI algorithm [Cardoso et Lahed(1996)]. Algorithm derivation, discussion and simulation results are provided to illustrate the usefulness of NS in that context. The new method has been compared with two of the most popular BSS algorithms ; EASI and Quasi Maximum-Likelihood algorithm [Pham et Garrat(1997)]. To deal with the particular BSS problem of heavy-tailed data, we proposed in [Sahmoudi et al.(2004a)] to use normalized second and higher order statistics that are shown to be of finite values and have the appropriate structure based on which the BSS can be achieved. In this section, we propose to use the normalized statistics to robustify and adapt the class of BSS algorithms using composite (second and higher order) criterion to the impulsive source case. Particularly, the EASI (equivariant adaptive separation by independence) algorithm proposed by Cardoso and Lahed in [Cardoso et Lahed(1996)] and its batch version IBSS [Belouchrani et al.(1997b)] has attracted a lot of interest in the Independent Component Analysis community. However, we show by both analytical studies and computer experiments that this algorithm fails to perform separation of heavy-tailed sources such as alpha-stable signals, and divergence behaviors may be observed. We introduce then a robust-EASI criterion based on the normalized statistics and that is shown to be effective for the BSS problem in the considered context. Algorithmic details and simulations results related to the iterative implementation, referred to as Robust-EASI algorithm, are provided and discussed in this section. A broad and increasingly important class of non-Gaussian phenomena encountered in practice can be characterized as impulsive [Adler et al.(1998)]. It is for this type of signals and noise that the heavy-tailed distributions provide a useful theoretical tool. In this section we use the heavy-tailed behavior characterization of α-stable distributions to achieve the normalization of high-order statistics of the considered linear mixture. Let us recall this last property : Property 8.1. : Heavy-tailed asymptotic behavior Let X ∼ SαS an α-stable r.v. with α < 2. Then : P (X > x) ∼ γCα x−α as x → ∞ where Cα is a positive constant depending only on α. Thus, α-stable distributions have inverse power, i.e. algebraic tails. In contrast, the Gaussian distribution has exponential tails. This proves that the tails of stable laws are much thicker than those of the Gaussian distribution. And the smaller the value of α is, the thicker the tails. An important consequence of property 8.1 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 126 Normalized HOS-based Approaches is the no-existence of the second and higher order moments of stable distributions, except for the special case α = 2. However, thanks to the heavy-tailed behavior of the sources s, the normalized covariance and fourth-order cumulants exist for x = As. More precisely, we have established in [Sahmoudi et al.(2004a)] the following result : Theorem 8.5. : Normalized statistics of heavy-tailed mixtures 1. Let X be a heavy-tail distributed r.v. with index α. Then, for k > α, we have : k )2 ˆ (IE|X| =0 2k ˆ T →∞ IE|X| lim P ˆ denotes the time averaging operator IE[g(X)] ˆ where IE = T1 Tt=1 g[X(t)]. 2. Let R̂ be the sample covariance matrix of the mixture signal in (9.6) : T 1X R̂ = x(t)x(t)∗ T def t=1 R̂ Then, the normalized sample covariance matrix converges when T → ∞ T race(R̂) ∗ to a finite valued matrix of the form ADA with D is a positive diagonal matrix. ˆ 3. Let Cum[x(i), x(j), x(k), x(l)] be the sample quadri covariance tensor. Then the normalized sample quadri covariance tensor ˆ Cum[x(i), x(j), x(k), x(l)] Pn ˆ Cum[x(r), x(r), x(r), x(r)] r=1 of the mixture signal converges to a finite valued tensor. The consequence of this result are that many of the algorithms for source separation can be modified to be applicable to heavy-tailed sources. 8.4.1 Robust composite criterion for source separation [A]- EASI family criterion It was shown in [Cardoso et Lahed(1996), Belouchrani et al.(1997b)] that a general composite criterion for blind source separation can be defined as ½ Cg (B) = IE{zz∗ − I + [g(z)z∗ − zg(z)∗ ]}; (8.6) with z = Bx IE denotes the mathematical expectation, I is the identity matrix and z∗ denotes the (conjugate) transpose of the (complex) vector z. The non-linear function g is chosen such that, if z is a random vector with i.i.d. components, then Cg (B) is the null matrix in the noiseless case : B is a separating matrix ⇒ Cg (B) = 0 (8.7) where g is usually chosen as an odd non-linear function that preserves the phase of its argument, i.e. the i-th coordinate of g(z) is of the form gi (z) = gi (zi ) = fi (|zi |2 )zi where fi is a real function. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8.4 Normalized Non-linear Decorrelation BSS Methods 127 [B]- Robust-EASI family criterion In the case of heavy-tailed sources such as alpha-stable signals which have infinite moments of order equal or greater than two, the criterion Cg (B) is inadequate and divergence behaviors may be observed, especially if the non-linearities in g are strongly increasing functions (like a cubic distortion for instance). For simplicity let us choose g(zi ) = |zi |2 zi . • Note, for example, that even when B equals A−1 and we have z = s, the right hand of (8.7), Cg (B), does not converge to 0 since IE{|s2i |} and IE{|s4i |} are infinite, which undermines the validity of this separation procedure. • In practice, one can always agree that for a finite sample size that sample estimate of Cg (B) is of finite value. However, for impulsive signals and large sample sizes, the second order term IE{zz∗ } − I in Cg (B) will be negligible compared to the higher order term IE{g(z)z∗ − zg(z)∗ } (see 1 of theorem 8.5). In that case, the whitening will not be performed correctly and thus the algorithm fails to converge to the optimal solution. To mitigate this difficulty, we propose to modify criterion (8.6) to ensure the convergence of the two terms IE{zz∗ } − I and IE{g(z)z∗ − zg(z)∗ } For that, we use the concept of normalized statistics. Hence, in this section we propose a robustified version of EASI approach obtained by modifying Cg (B) into : h i ( ∗ ∗ zz∗ Pm −zg(z) Cg (B) = IE T race(zz + g(z)z ∗) − I 4 j=1 |zj | (8.8) with z = Bx resulting in the so-called Robust-EASI family of source separation algorithms. This modification preserves the structure of the standard EASI and IBSS algorithms such that the term IE{ zz∗ − I} T race(zz∗ ) in (8.8) has the effect of driving the diagonal elements of C = AB to all ones. Meanwhile, the other term IE{ g(z)z∗ − zg(z)∗ Pm } 4 j=1 |zj | in (8.8) drives the off-diagonal elements of C to zeros. 8.4.2 Iterative quasi-Newton implementation To solve (8.8), we propose to use a block technique based on the processing of T received sample and consists of searching the zeros of Cˆg (B), which is the sample M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 128 Normalized HOS-based Approaches version of Cg (B) : 4 ˆ C(B) = 1 T T nh X t=1 ∗ Pz(t)z(t) m 2 j=1 |zj (t)| i h io ∗ ∗ Pm −z(t)g(z(t)) − I + g(z(t))z(t) 4 |zj (t)| j=1 (8.9) with z = Bx An approximate solution of Cˆg (B) = 0 may be obtained by the Newton technique : ˆ Cg (B) is replaced by its first order approximation around some B so that the resulting linear equation can be solved exactly ; solutions are then obtained iteratively in the form Bp+1 = (I + Ep )Bp . At step p, a matrix Ep is determined from a local linearization of Cˆg (Bp+1 ). The benefit that leads to explicit expression of Ep under the additional assumption that Bp is close to a separating matrix. This iterative implementation is summarized by the following algorithm so called Robust-EASI algorithm in Table 8.2. $ ' Robust-EASI Algorithm Step 1. Initialization : chose B0 randomly and set z(t) = B0 x(t), t = 1, · · · , T Step 2. Computation of matrix E : Eij = with ρ̂ii κ̂ij + ζ̂ji∗ (δ̂ij − ρ̂ij ) ρ̂ii ζ̂ij + ζ̂ji∗ ρ̂jj , i, j = 1, · · · , m (8.10) ∗ ˆ Pmzi zj 2 } ρ̂ij = IE{ |zj | j=1 ∗ 2 2 (|zj | )−fi (|zi | )] ˆ zi zj [fjP κ̂ij = IE{ } m 4 j=1 |zj | 2 [f 0 (|z |2 )|z |2 +f (|z |2 )−f (|z |2 )] |z | j i i i i j j i ˆ Pm ζ̂ij = IE{ } 4 j=1 |zj | Step 3. Update the estimated source signals : z(t) ←− (I + E)z(t), for t = 1, · · · , T Step 4. Check for convergence : if ||E|| < ² Stop (² is a small threshold ), otherwise go back to step 2. & % Tab. 8.2 – The principal steps of the proposed Robust-EASI algorithm. 8.4.3 Performance evaluation & comparison In this section we compare our NS-based Robust-EASI method to two widely used BSS algorithms, EASI and RQML [Shereshevski et al.(2001)]. Recall M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8.4 Normalized Non-linear Decorrelation BSS Methods 129 that RQML is the restricted quasi-maximum likelihood approach introduced as an extension of the popular Pham’s quasi-maximum likelihood (QML) approach [Pham et Garrat(1997)] to the α-stable sources case. Two simulation examples with different types of distributions and a variety of sample size and noise power are presented. All simulation results are averaged over 200 Monte-Carlo runs and the mixing matrix A is generated randomly at each run. To measure the quality of separation, we did use the generalized rejection level criterion defined as follows [Sahmoudi et al.(2004a)] : m Iperf = m 1 X 1 X X | Cij |α γj Ii = . m m | Cii |α γi i=1 (8.11) i=1 j6=i def where γl denotes the dispersion of source sl and C = BA. [A]- Experiment 1 : Alpha-Stable Mixture In this experiment, mixtures of three heavy-tailed symmetric standard α-stable (µ = 0 and γ = 1) signals with characteristic exponent α = 1.5 are considered. The number of observations is n = 3 and the mixture is noise-free. Figure 8.3 presents the generalized mean rejection level (8.11) versus the sample size of each algorithm. Alpha−stable Mixture 0 −5 Iperf rejection level in dB Robust−EASI EASI RQML −10 −15 −20 −25 −30 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Sample size Fig. 8.3: Generalized mean rejection level versus the sample size. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 130 Normalized HOS-based Approaches From this results, we can observe that EASI has failed to separate α-stable mixtures. It can be seen that the new proposed Robust-EASI can separate correctly the α-stable mixtures even for short sample sizes and outperforms the RQML method. [B]- Experiment 2 : Generalized Gaussian Mixture The generalized Gaussian distribution has a density proportional to exp(−|x|p ), p > 0. A p less than 2 gives a distribution suitable as an impulsive signal model. By inferring p a wide class of probability distributions can be characterized including uniform, Gaussian, Laplacian and other sub- and super- Gaussian densities. In this experiment the three sources are impulsive with generalized Gaussian distribution according to p = 1.5. Three mixtures corrupted by additive white Gaussian noise are considered. We characterize the performance of each algorithm in terms of def signal rejection level. When C = BA, the i-th estimated source is : sˆi (t) = zi (t) = m X Cij sj (t) j=1 which contains the j-th source signal at level |Cij |2 /|Cii |2 . Then, in this case the averaged rejection level is given by m Iperf = m 1 X 1 X X | Cij |2 Ii = . m m | Cii |2 i=1 (8.12) i=1 j6=i Figure 8.4 presents the mean rejection level (8.12) versus the noise power of each algorithm. Eventhough the impulsive sources are not heavy-tailed, the Robust-EASI algorithm still outperform largely the two other algorithm. That illustrates the fact that the proposed approach is quite general and can be applied to a larger class of source signal distribution including of course the heavy-tailed one. In overall comparison, Robust-EASI method has reliable performance in all considered situations whereas the other methods may fail, in particular if the underlying assumptions on sources are not completely valid. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 8.5 Concluding Remarks 131 Generalized Gaussian Mixture 0 −2 RQML −4 −6 EASI Iperf Mean rejection level −8 −10 −12 Robust−EASI −14 −16 −18 −20 −30 −25 −20 −15 Noise power in dB −10 −5 0 Fig. 8.4: Mean rejection level versus the noise power with T = 1000. 8.5 Concluding Remarks In this chapter, two new NS-based blind separation methods for impulsive source signals with heavy-tailed distributions are introduced : • Robust-JADE algorithm : The normalized 2-nd and 4-th order cumulants of the mixture signal are shown to be convergent to finite-valued matrices with the appropriate algebraic structure that is traditionally used in many 2-nd and higher order statistics based BSS methods. The advantages of the proposed method Robust-JADE are discussed and a simulation based comparison with the MD method is provided to illustrate and assess its performances. • Robust-EASI algorithm : In this chapter, we have proposed another approach using normalized statistics for heavy-tailed mixing to improve the robustness of the EASI family algorithms. A normalized criterion was derived and used for heavy-tailed source separation. The latter is solved using an efficient quasi-Newton iterative algorithm. Comparative simulations have been provided to illustrate the effectiveness of the Robust-EASI algorithm. More studies need to be done on the optimal non-linear function g choice for heavytailed sources. Note that the same methodology used here can be used to derive a normalized version of other existing non-linear decorrelation criterion that can separate correctly heavy-tailed independent components. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 132 Normalized HOS-based Approaches M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 9 A Semi-Parametric ML Approach In this chapter, we propose a method for estimating in a semi-parametric way the density of the missing data in the blind source separation problem. We consider a log-spline model of a fixed size and maximum likelihood estimator of this density in the linear BSS problem. So we get a log-spline density estimator, which can be approached using a stochastic version of the expectation-maximization (EM) algorithm coupled with the MCMC method. 9.1 The Likelihood of the BSS Model A very popular approach for estimating independent sources is the maximum likelihood (ML) method. A short introduction was provided in chapter 5. In this section, we show how to apply ML estimation to BSS. 9.1.1 Derivation of the likelihood It is not difficult to derive the likelihood of the observation vector x in the noisefree BSS model. This likelihood is based on the well-known result on the density of a linear transform [Papoulis(1991)]. According to this formula, the density function px of the mixture vector x = As can be formulated as Y px (x) = |det(B)|ps (s) = |det(B)| pi (si ) (9.1) i A−1 where B = and the pi denote the densities of the independent components. This can be expressed as a function of B = (b1 , · · · , bn )T and x, as follows : Y px (x) = |det(B)| pi (bTi x) (9.2) i 134 A Semi-Parametric Maximum Likelihood Approach Assume that we have T observations of x, denoted by x(1), · · · , x(T ). Then the likelihood can be obtained as the product of this density evaluated at the T points : L(B) = T Y |det(B)| t=1 n Y pi (bTi x(t)) (9.3) i=1 very often it is more practical to use the logarithm of the likelihood, since it is algebraically simpler. This does not make any difference here since the maximum of the logarithm is obtained at the same point as the maximum of the likelihood. The log-likelihood is given by log L(B) = T X n X pi (bTi x(t)) + T log |det(B)| (9.4) t=1 i=1 9.1.2 Sources density estimation In this work, we propose a new procedure based on the maximum likelihood approach to estimate the mixing matrix. However, there is another thing to estimate in the BSS model, though. This is the density of the independent components. This make the problem much more complicated, because the estimation of densities is, in general, a non-parametric problem. Non-parametric means that it cannot be reduced to the estimation of a finite parameter set. In fact the number of parameters to be estimated is infinite, or very large. Thus the estimation of the BSS model has also a non-parametric part, which explains why the proposed method is called semi-parametric. Non-parametric pdf estimation is known to be a difficult problem. This is why we would like to avoid the non-parametric density estimation in the BSS. There are two ways to avoid it : • Parametric : First, in some cases we might know the densities of the independent components in advance, using some prior knowledge on the data at hand. Then the likelihood would really be a function of the mixing matrix only. If reasonable small errors in the specification of these prior densities have little influence on the estimator, this procedure will give reasonable results. In fact, it will be shown below, by computer simulation, that this is the case in impulsive environment using the alpha-stable distributions. • Semi-parametric : A second way to solve the problem of density estimation is to approximate the densities of the independent components by a family of densities that are specified by a limited number of parameters. If it is possible to use a very simple family of densities to estimate the BSS model, we will get a simple solution. Fortunately, this turns out to be the case using log-spline functions. 9.1.3 Optimization via the EM algorithm If the likelihood of the observation can’t be maximized directly, it is possible to do iterative maximization steps in order to approach the maximum. For example, the expectation-maximization (EM) algorithm, proposed in [Dempster(1977)], is a broadly applicable approach for iterative computation of maximum likelihood M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 9.2 Semi-Parametric Source Separation 135 estimates, useful in a variety of incomplete data (or partially observed data) statistical problems. We recall here briefly the principle of the EM algorithm which is a two-step iterative procedure. One iteration is composed by E-step and M-step. The E-step computes Q(θ | θk ) = IE {log f (x, s; θ) | x; θk } and the M-step determines θk+1 as maximizing Q(θ | θk ). Stochastic versions of EM have been introduced from different perspectives to deal with situations where the E-step is infeasible in closed form. This is often the case, because the expectation has no analytical form. In this case, Monte Carlo Markov Chain (MCMC) replace the E-step by a Monte Carlo approximation of the expectation based on a large number of independent simulations of the missing data [Meng et Rubin(1993)]. An other way to get round the difficulty of the computation of the expectation is also to use simulation, not for approximating some integral deriving from an expectation, but for getting some numerical plausible value standing for the missing data. One proposition was to simulate the missing data from the a posteriori distribution with the current values of the parameters [Diebolt et Celeux(1993)]. To be more efficient, the SAEM algorithm, proposed in [Delyon et al.(1999)], replaces the E-step not only by the simulation of the missing data, but by stochastic approximation involving these simulated data. At iteration k, SAEM generate m(k) realizations sk (j) (1 ≤ j ≤ m(k)) from the a posteriori distribution denoted p(s | x; θk ) and updates Qk−1 (θ) according to 1 m(k) X log f (x, sk (j); θ) − Qk−1 (θ) (9.5) Qk (θ) = Qk−1 (θ) + γk m(k) j=1 where γk is a sequence of positive step-sizes decreasing to 0. Anyway, the use of simulated data for estimating parameters in missing data statistical problems is a powerful approach that tends to become popular since the 90’s. In this work we use this procedure for the blind source separation problem. 9.2 9.2.1 Semi-Parametric Source Separation Noisy linear instantaneous mixtures. In this chapter, we consider the classical noisy linear BSS model with instantaneous mixtures given by : x(t) = As(t) + ε(t), t = 1 . . . T (9.6) where A is a n × m unknown full column rank mixing matrix. The sources s1 (t), · · · , sm (t) are collected in a m × 1 vector denoted s(t) and are Q assumed to be i.i.d. signals : the joint distribution density π is factorized as π = m j=1 πsj . The noise vector ε(t) (independent with s(t)) has independent components ε1 (t), . . . , εn (t) M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 136 A Semi-Parametric Maximum Likelihood Approach with zero mean and unknown variance σ 2 . The goal of a BSS method is to find a separating matrix i.e. an m × n matrix B such that the recovered sources Bx(t) are as independent as possible. In the noiseless case, (9.6) admits a unique solution up to scaling and permutation indeterminacy y(t) = Bx(t) such that 4 C = BA = PΛ, where Λ is a diagonal scaling matrix and P is a permutation matrix (see [Hyvarinen et al.(2001)]). At most one source is allowed to be Gaussian to ensure the identifiability. Another problem is that if one or more sources do not have finite second or higher moments (e.g. heavy-tailed distributions) then prewhitening or criteria optimization would cause a breakdown [Chen et Bickel(2004), Sahmoudi et al.(2004a), Sahmoudi et al.(2005)]. 9.2.2 The proposed approach. Our purpose is to estimate by maximum likelihood the density π, the mixing matrix A and the noise-variance σ 2 . In this section, we present a semi-parametric method to BSS using maximum likelihood estimation in a log-spline model in order to avoid any assumption of the source distribution. Nevertheless we suppose that all sources are independent and have the same common distribution. We use log-spline models for two reasons : on one hand, they have good functional approximation properties, on the other hand, they are well-adapted to the implementation of the SAEM algorithm [Kuhn et Lavielle(2004)] allowing to compute easily our estimator. Moreover, this estimation technique is inherently robust towards outliers and impulsiveness effects. For this reason, we apply this method to impulsive random variables with possibly heavy-tailed distributions characterized by infinite second and higher order moments. Any BSS problem can be seen as an usual missing data problem. Indeed, the observed data are the observations {x(t)}1≤t≤T , whereas the random sources {s(t)}1≤t≤T are the unobserved data. Then, the complete data of the model is {x(t), s(t)}1≤t≤T . We suppose that the unobserved sources are related to the ob1 servations through the density functions Qm h of x conditionally to s . Our purpose is to estimate the sources density π = j=1 πsj , the mixing matrix A and the noisevariance σ 2 . For that we propose a semiparametric approach which consists of combining the logspline model for sources density approximation with a stochastic version of the EM algorithm. We use logspline models for two reasons : on one hand, they have good functional approximation properties, on the other hand, they are well-adapted to the implementation of the SAEM (Stochastic Approximation version of the Expectation Maximization) algorithm [Kuhn et Lavielle(2004)] allowing to compute easily our estimator. Indeed, the first assumption of the used SAEM algorithm is equivalent to suppose that the complete data likelihood f (x, s, η) belongs to the curved exponential family and can be written : n o f (x, s, η) = exp −Ψ(η) + hS̃(x, s), Φ(η)i (9.7) 1 The distribution of x conditionally to s, denoted by h, corresponds in the fact to the distribution of the additive noise in the BSS model (9.6) with the same variance and a non-zero mean value equal to As. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 9.2 Semi-Parametric Source Separation 137 where h., .i denotes the scalar product, η denotes the unknown global parameters vector to be estimated and S̃(x, s) is known as the minimal sufficient statistics (MSS) of the complete model. In this case of unknown density functions following model (9.7), a good approximation which satisfies this latter condition is given by the logspline model. Moreover, it is was shown that this estimation technique is inherently robust against outliers and impulsiveness effects [Takada(2001)]. For this reason, we apply this method to impulsive random variables with possibly heavytailed distributions characterized by infinite second and higher order moments. We define now precisely the logspline model which will be used. 9.2.3 Density estimation by B-spline approximations In order to get a non parametric estimate of the source density function π, we propose to use the logspline model. Let I be equal to [a, b] where −∞ < a < b < +∞ and consider a given knots sequence τ = (tl )1≤l≤K+1 with a = t1 and b = tK+1 . Consider now the space S q,τ of spline functions of positive order q on I, namely piecewise polynomial functions of degree q − 1 associated to this knots sequence. Then the dimension of S q,τ is equal to J = q + K − 1 and there exists a B-splines basis denoted B1 , · · · , BJ for S q,τ [de Boor(1978)]. The logspline density estimation method models a log-density function as a spline function : J X ∀s ∈ I, πθ (s) = exp θj Bj (s) − c(θ) (9.8) j=1 Z J X where c(θ) = log exp θj Bj (s) ds I j=1 is a normalization factor and θ = (θ1 , . . . , θJ ) ∈ RJ . We choose the dimension √ J of the logspline model in function of the sample size T such that J = o( T ) (see [Kuhn et Lavielle(2004)] for more details). We define now the observed loglikelihood corresponding to the logspline model of the observations defined as follow : Z T 1X log h (x(t)|s) πθ (s)ds (9.9) LT (θ) = T I t=1 Then we consider the maximum likelihood estimator πθ̂T,J of the density π in the logspline model given by : θ̂T,J = arg max LT (θ) θ∈ΘJ (9.10) This family is not identifiable since we have for all a real : c(θ + a) = c(θ) + a implying that πθ+a = πθ . We set systematically θJ = 0 in order to get an identifiable family of log-density functions and we denote ΘJ the subspace of RJ composed of vectors having zero as last coordinate and Mq,τ the set of associated densities, i.e. {πθ , θ ∈ ΘJ }. We describe briefly some properties of the B-splines detailed in de Boor’s book [de Boor(1978)] : M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 138 A Semi-Parametric Maximum Likelihood Approach – B-spline For all 1 ≤ j ≤ J, the function Bj takes values in the interval [0, 1]. MoreoP ver, we have Jj=1 Bj (s) = 1 ∀s ∈ I. – Approximation property of the logspline model We define δJ = inf θ∈ΘJ k log f − log πθ k∞ . For some positive continuous density function f on I, δJ tends to zero when J goes to infinity. See [de Boor(1978)] for more details on the links between the convergence rate and the regularity of f . The particular properties of the logspline model let us think that πθ̂T,J will have remarkable properties when T tend to infinity. In a first time, we explain how we compute this estimator in practice simultaneously with the mixing matrix and the noise variance. 9.2.4 The SAEM algorithm To compute the unknown parameters η = (θ T , vec(A)T , σ 2 )T , we use the SAEM algorithm coupled with a MCMC (Markov Chain Monte-Carlo) procedure presented in [Kuhn et Lavielle(2004)]. Here we apply this algorithm for estimating the mixing matrix A and the variance σ 2 using the logspline model to approach the estimate πθ̂T,J . The complete log-likelihood corresponding to the logspline model has the following expression : Lcom T (η) = T T 1X 1X log h (x(t)|s(t)) + log πθ (s(t)) T T t=1 (9.11) t=1 So we apply the SAEM algorithm to this parametric model in order to approach the estimator η̂ T,J of η, that maximizes the observed log-likelihood. To put out the minimal sufficient statistics of the model, we write the developed expression of the complete log-likelihood : Lcom T (η) " # T J T X 1X 1X = log h (x(t)|s(t)) + θj Bj (s(t)) − c(θ) T i=1 T t=1 j=1 P We choose as MSS S̃(x, s) = ( T1 Ti=1 Bj (si ), 1 ≤ j ≤ J) and we implement the k-th iteration of the SAEM algorithm as : • S-step : Generate a realization s0 using as proposal distribution the prior distribution πθk and take sk equal to s0 or to sk−1 according to the value of the acceptance probability. • A-step : Update the minimal sufficient statistics S̃k according to the stochastic approximation : ³ ´ S̃k = S̃k−1 + βk−1 S̃(x, sk ) − S̃k−1 (9.12) where βk is a positive step-sizes sequence decreasing to 0. • M-step : Update η k by maximizing the complete log-likelihood of the model evaluated in the observations and in the current value of the minimal sufficient statistics. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 9.3 Performance evaluation & comparison 139 This algorithm converges a.s. toward a local maximum of the log-likelihood of the observations under very general regularity conditions (see [Kuhn et Lavielle(2004)] for convergence results). In practice, the algorithm is easy to implement and has a relatively low computational cost. 9.3 9.3.1 Performance evaluation & comparison Some existing BSS methods We briefly describe here three BSS approaches for comparison with the new semi-parametric approach introduced above. 1) FastICA algorithm [Hyvarinen et al.(2001)]. Under the whitened zero-mean demixing model y = Wz, the FastICA algorithm finds the extrema of a generic cost function IE{G(wT z}, where wT is one of the rows of the demixing matrix W. The cost function can be e.g. a normalized cumulant or an approximation of the marginal entropy which is minimized in order to find maximally nongaussian projections wT z. This algorithm is facing three problems. First, some sources may not have zero means in which case the mean values must be explicitly included in the analysis. Second, in FastICA, the derivative of the even function G is assumed to be an odd function. If this condition fails to be satisfied, the FastICA as such may not work. Third, FastICA is not robust to heavy-tailed effect. 2) JADE algorithm [Cardoso et Souloumiac(1993)]. This algorithm operates on cumulants as a measure of independence. It seeks to approach independence through the maximizing of the higher order cumulants. However, one major weakness of this algorithm is that higher order cumulants are extremely vulnerable to outlier effects. Besides being sensitive to outliers, JADE also fails to separate certain source distribution, i.e. skewed zero-kurtotic signals generated by the power distribution. This is because by minimizing only the 4-th order cumulants, third order effects like the skewness are ignored. 3) Minimum Dispersion (MD) algorithm [Sahmoudi et al.(2005)]. This approach is a two-step parametric algorithm for heavy-tailed source separation. Step 1 : Robust whitening. In the case of α-stable signals, it is proven in [Sahmoudi et al.(2004a)] that the norP R̂x malized covariance matrix of x defined by R̂nx = with R̂x = T1 t x(t)x(t)T T race(R̂x ) converges asymptotically (i.e. when T tends to infinity) to the finite matrix ADAT , where D is a positive diagonal matrix. Hence, the normalized covariance matrix has the appropriate structure and the whitening problem becomes standard. Step 2 : MD criterion. Let z(t) = Bx(t) where B is an orthogonal separating matrix to be estimated and x denotes the whitened data. It is shown in [Sahmoudi et al.(2004a)] that under Pm orthogonality constraint, the MD criterion given by J(B) = i=1 γzi , where γzi denotes the dispersion of zi (t) the i-th entry of z(t), is a contrast function. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 140 A Semi-Parametric Maximum Likelihood Approach The essential limitation of this method is that it can be used only for heavy-tailed sources with α-stable distribution. 9.3.2 Parametric versus semi-parametric approaches The MD method is said to be parametric in the sense that it relies on the a priori knowledge of the exact source pdf. In this case, we have a finite set of parameters to estimate. On the other hand, the SAEM method is said to be semiparametric in the sense that the source pdf is unknown and need to be jointly estimated with the desired parameters (i.e. mixing matrix) [P. Bickel(1998)]. Clearly, estimating a pdf is a difficult problem as the number of parameters to be estimated is infinite. In the semi-parametric approach, we estimate a limited number of parameters by replacing the estimation problem by an approximation one. The parametric approach is preferred whenever a reliable a priori knowledge on the source pdf is available. In the situations where the pdf is only partially or inaccurately known, semi-parametric methods should be used because of their robustness against modelization errors as shown next by simulation results. 9.3.3 Computer simulation experiments Here, we compare our proposed semi-parametric method SAEM to JADE, FastICA and to the parametric MD algorithm. In all simulation experiments the results are averaged over 100 iterations and the mixing matrix A is generated randomly at each run. The stepsize sequence (βk ) used for SAEM was βk = 1/k. For the choice of the size J of the logspline model in SAEM, we have tested some values for J lower than 10 since we have at least 100 observations. The best estimation seems to be given for q = 4 and J = 5, so we will hold these values for the following experiments. We choose as initial value θ0 , such that the logspline density estimate is initialized with the uniform distribution on I = [−50, 50]. To measure the quality of separation we will use Amari’s error criterion as a performance index (PI) defined as PI = m X m X i=1 j=1 Ãm ! m X X |Ci,j | |C | i,j − 1 + −1 maxk |Ci,k | maxk |Ck,j | j=1 i=1 where C = (Ci,j )1≤i,j≤m = BA is the global system. • Experiment 1 : Robustness against outliers. First, we test the robustness against outliers. We mix two sources, one of Gaussian distribution and the second of uniform distribution with randomly chosen mixing matrices. The data set contains 1000 points. Without outliers, the performances of SAEM, JADE and FastICA are all excellent (PI ≈ 0.05). To test for outlierrobustness, we replace 50 data point with outliers, i.e. uniformly distributed data points within a disc of radius 500 around the origin (the norm of the original data points is roughly within the range from 0 to 100). As expected, SAEM still works fine. In fact, typically it does not even change its solution, because it simply M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 9.3 Performance evaluation & comparison 141 ignores the outliers in the B-spline adjustment stage. JADE and FastICA however, produce arbitrary results because they employ higher-order statistics which are highly sensitive to outliers. • Experiment 2 : Asymptotic consistency. Figure 9.1 shows some simulation results in case of noiseless three mixtures (n = 3 observations) of three sources (m = 3) with, respectively, a uniform distribution on [0, 1], a Gaussian distribution with zero mean and unit variance and standard SαS with α = 1.5. To detect whether BSS algorithms can obtain consistent estimates in such situation, the sample size was increased from (1) : T = 1000 to (2) : T = 5000. We compare SAEM and two other famous BSS algorithms, JADE and FastICA. Similarly to [Chen et Bickel(2004)], we present the boxplots based on quartils to assess the consistency of our method. 1 0.9 0.8 PI: Amari error 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FastICA(1) FastICA(2) JADE(1) JADE(2) SAEM(1) SAEM(2) Fig. 9.1: Consistency of different BSS algorithms. The sample sizes were 1000 for case (1) and 5000 for case (2). From the boxplots (Figure 9.1), we can see that as the sample size increases, the estimation error (PI) for SAEM decreases more significantly toward zero than for JADE and FastICA. • Experiment 3 : Robustness against impulsive noise. In this experiment we add impulsive noise to the above mixtures (considered in the experiment 2) according to x(t) = As(t) + σ²(t) with ²(t) being a n-dimensional Gaussian noise of unit variance. We track the evolution of the performance index as a function of the noise level σ for kurtotic (super-Gaussian) noise : we used multidimensional Gaussian noise, where we change the absolute value to the power of 5. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 142 A Semi-Parametric Maximum Likelihood Approach 0.7 FastICA JADE SAEM 0.6 PI: Amari error 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 0.25 Noise level 0.3 0.35 0.4 0.45 0.5 Fig. 9.2: The performance index versus noise level. Figure 9.2 shows that JADE and FastICA start to fail at a certain noise level, whereas SAEM continues to produce good BSS solutions. Note that we have chosen the median over 100 runs because the PI depend strongly on the actual realization of the noise. • Experiment 4 : Robustness against error modelization. Here, we consider m = 3 impulsive sources with generalized gaussian distribution of parameter p = 1.5 (i.e. the source pdf is proportional to exp(−|x|p )). In that case, the signals are of finite variances and n = 4 noise free mixtures are considered. 1.2 MD SAEM 1 PI: Amari error 0.8 0.6 0.4 0.2 0.1 0.05 0 1000 1500 2000 2500 3000 Sample size 3500 4000 4500 5000 Fig. 9.3: The performance index versus sample size. As can be observed from figure 9.3, the MD method fails to separate correctly the sources as it relies on the SαS source pdf assumption that is not verified in M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 9.4 Concluding Remarks 143 this example. This illustrate the robustness of the SAEM compared to the MD method with respect to the pdf modelization errors. 9.4 Concluding Remarks In this work, we developed a new semi-parametric BSS method using the SAEM algorithm. The proposed method is applied for the blind separation of noisy linear instantaneous mixtures of possibly heavy-tailed sources. The SAEM based method is compared with the JADE, FastICA and the minimum dispersion (MD) methods and shown to be more general (as it can be applied to a larger class of source signals and in different scenario). The proposed SAEM algorithm outperforms JADE and FastICA in terms of consistency and robustness against the outliers and impulsive noise and outperforms the MD method in terms of robustness against modelization errors. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 144 A Semi-Parametric Maximum Likelihood Approach M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Troisième partie Séparation et Estimation des Signaux FM Multicomposantes dans un Environnement Impulsif Dans le premier chapitre de cette partie (chapitre 10), nous rappelons les grands principes et méthodes existants de l’estimation des signaux FM non-stationnaires. Ensuite, nous présentons nos approches novatrices en présence d’un bruit additif de nature impulsive modélisé par une distribution α-stable. Chapitre 10 State of the Art The two last decades in particular has witnessed a surge of interest in the analysis of time-varying or non-stationary processes. The beginning of the 80’s saw efforts in various parts of the world at developing spectral analysis techniques which would overcome the drawbacks of classical spectral analysis [Grenier(1984)], [Boashash(1991)]. These drawbacks arise largely due to the fact that the Fourier transform signal characterization (upon which classical analysis is essentially based) assumes the spectral characteristic of both signal and noise are time-invariant. When the important spectral features of the signal and or noise are time varying, the effect of Fourier analysis is to produce an averaged (smeared) spectral representation. One of the consequences of this smearing is that there is a loss in frequency resolution. One can try to reduce this smearing by obtaining the spectral estimates over short time intervals, so that the spectral components do not vary too greatly within the window. However, the shortened observation windows produce smearing of a different kind, this time due to the uncertainty relationships of time and band limited signals. Research early in the 1980s focused on two directions : modern parametric spectral analysis and time-frequency analysis. In this chapter, we present a brief state-of-the-art of the nonstationary FM signals analysis problem, which consists of spotlights on a few important existing methods. 10.1 Modern Spectral Analysis Approaches Parametric modeling of nonstationary, signals has received a great deal of attention in the eighties [Grenier(1984)]. The approaches usually developed are the representations of these signals by AR or ARMA models with time-varying coefficients. The coefficients are then approximated on a basis of known time-varying functions, giving rise to a set of invariant parameters which are the coordinates of 148 State of the Art the coefficients. This approach offers the advantage of leading to the same type of identification procedures as for AR or ARMA models with constant parameters. Another potential advantage of this kind of modeling is an improved accuracy of parameter estimation methods applied to time-varying signals in comparison to other estimation methods based upon the assumption that the signal is stationary over a time interval. Several algorithms derived for stationary signals plus observation noise have been extended to the nonstationary case [Grenier(1984)]. However, it appeared that in some cases the performance of the estimators was reduced in the nonstationary case. For more details one can finde a good review of parametric or modern spectral analysis methods in [Grenier(1984)] and [Boashash(1992a)]. 10.2 Time-Frequency Analysis Approaches The long research on the Winger-Ville Distribution (WVD), realized that it was a means to attain good frequency localisation for rapidly time-varying signals [Boashash(1992c)]. This interest was fuelled by the discovery that it had a number of very attractive properties [Classen et Mecklenbrauker(1980)], as well as the evidence that the technique could be put to good practical use. The advance of digital computers also aided its popularity, as the hitherto prohibitive task of computing a two dimensional distribution came within practical reach. As the research in the area continued, the importance of the WVD for random signal analysis become apparent. In [Martin(1982)], the author showed that the WVD’s expected value is simply the Fourier transform of the time-varying autocorrelation function. This gave the WVD an important interpretation as a time varying Power Spectral Density (PSD), and sparked significant research efforts along this direction. The WVD as an important time-varying filtering tool was also realised early. In [Boudreaux-Bartels et Marks(1986)], a simple algorithm is derived which consiste of masking (filtering) the input signal and then performed a least-squares inversion of the WVD to cover the filtered signal. Many refinements, extensions and simplifications were developed to further this pioneering work on WVD based timevarying filtering. Detection and estimation were other research areas which saw theoretical developments based on the WVD [Kay et Boudreaux-Bartels(1985)], [Boashash et Rodriguez(1984)]. One of the crucial factors motivating such interest was the fact that since the WVD is a unitary ’energy preserving) transform, many of the classical detection and estimation problem solutions had alternate implementations based on the WVD. The time-frequency nature of the implementation, however, allowed greater flexibility than did the classical ones. Despite all the advances made in the theory and application of the WVD to so many areas of signal processing, it was generally accepted that the WVD had a number of limitations. One of the main limitations was considered to be the nonlinear nature of the WVD. The WVD performs a bi-linear transformation of the frequency components of a signal, a fact which is significant for both deterministic and random signals. For deterministic multicomponent signals, the bi-linearity causes ”cross-terms” or ”artefacts” to occur between the true frequency components. This can often render the WVD almost impossible to interpret visually. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 10.2 Time-Frequency Analysis Approaches 149 For random signals, the bi-linear transformation exaggerates the effects of noise by creating cross-terms between all noise and signal components. At low signal to noise ratio (SNR), where the noise term of the bi-linear kernel dominates, this effect can contribute to a very rapid degradation of performance. A second drawback which was attributed to the WVD is its inherent bias towards infinite duration signals. Since it is essentially the Fourier transform of a bilinear kernel, it is ”tuned” to the presence of infinite duration complex sinusoid in the kernel and hence to linear FM components in the signal itself. Practical signals are often highly localised in time, so that a simple Fourier transformation of the kernel, does not provide a very effective analysis of the data. Much came of the efforts to overcome these drawbacks. Cohen had already paved the way for reducing the non-linear effects of the WVD by his work in quantum mechanics, in which he proposed a generalized class of ”smoothed” Wigner distributions [Cohen(1966)]. He showed that an infinite number of joint distributions with useful properties could be produced by performing a 2D smoothing function of the Wigner distribution, the particular distribution depending on the smoothing function used. Researchers then turned to 2D smoothing functions to reduce the artefacts, the most popular smoothing functions initially being the 2D Gaussian function. Further impetus to the attempted reduction of artefacts came with the understanding that in the ambiguity function domain, the cross-terms tended to be distant from the origin, while the auto terms passed through the origin [Flandrin(1998)]. This was especially helpful since the WVD was known to be related to the ambiguity function by 2D Fourier transformation [Boashash(1991)]. 2D Fourier inversion of isolated regions of the ambiguity function was then used to effect the cross-term reduction. Subsequently, greater refinements and purpose entered the design procedure for these TFDs. Choi and Williams used the ambiguity domain to design their variable level smoothing function, so that artefacts could be reduced to a greater or lesser extent, depending on the application [Choi et Williams(1989)]. Zhao, Atlas and Marks designed kernels in which the artefacts folded back onto the auto-terms [Zhao et al.(1990)]. The latter effect was desirable, so as to be able to obtain visually satisfying representations. In [Kootsookos et al.(1992)], authors showed how one could vary the shape of the cross-terms by appropriate kernel design. parallel to the developments in smoothing of the WVD, another approach was used to nullify the troublesome non-linear effects in the WVD. This approach was based on the fact that the cross WVD (XWVD), although being closely related to the WVD and having many of its desirable properties, is a linear distribution in the observed signal. Efforts were made, then, to use the XWVD instead of the WVD, wherever possible. The problems relating to the WVD’s poor performance with short duration signals were addressed in a number of different ways. Perhaps the first method proposed was to modify the WVD by performing the spectral estimation of the kernel function with a Mellin transform [Marinovic(1984)]. Another method put forward for better dealing with short duration signals was to use autoregressive spectral estimators of the kernel, which could be reliably be applied to short data sequences [Boashash(1991)]. The emphasis on time-varying spectral analysis which occurred during the 1980s M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 150 State of the Art also led very naturally to a heightened awareness of instantaneous frequency. For analysts who were used to dealing with time-invariant systems, the simultaneous use of the words instantaneous and frequency contained an element of contradiction. Frequency is usually assigned to the eigenvalues of the system’s eigenfunctions, and only defined for persistent processes. It becomes clear that a better understanding of what was meant by ”instantaneous frequency” and how to estimate this important quantity was needed. 10.2.1 IF estimation using time-frequency methods Not surprisingly, then, much work did focus on the concepts underlying the IF and its relationship to TFDs [Boashash(1991)]. A summary of the developments may be found in [Boashash(1992a), Boashash(1992b)]. Further work concentrated on techniques for estimating the IF, with a number of useful new algorithms being developed. Various techniques had been devised over the years for the estimation of IF, but many of them were developed in the communications area, and as such, were suited more to communications signals than to those encountered generally in signal processing environments. Several IF estimation techniques have been developed recently to allow for a broader signal model, or for greater robustness to noise. This is because the instantaneous frequency is one of the most important features of any signal. There are two major approaches for IF estimation of FM signals : parametric and non-parametric. The non-parametric approach is based on using time-frequency distribution. In summary, there are two major existing approaches for IF estimation using TFD. The first is built on the first–order moment of TFD [Boashash(1991)]. The first–order moment of the WVD yields the IF [White et Boashash(1988), Boashash(1991)], while others yield approximations of the IF [Boashash(1992c)]. However it fails to estimate multicomponent signals due to the presence of cross– terms. The second approach is built on utilizing the fact that all TFD have peaks around the IF laws of signals. The peaks of the WVD was used for IF estimation and applied to many problems [Boashash(1992c)]. For better performance at lower SNR, the XWVD was proposed [Boashash et O’Shea(1993)]. Other algorithms of TFD–based peak estimation can be found in, for examples, [Boashash(1992c)], [Stankovic et Katkovnik(1998)], [Katkovnik et Stankovic(1998)], [Luigi et Moreau(2002a)], [Luigi et Moreau(2002b)]. Like the first approach, this approach also suffers from the presence of cross–terms in multicomponent signals which results in poor estimation. Upon the desired to design high resolution RID, BD was then proposed in [Barkat(2000)], and MBD was developed in [Hussain(2002)], both with adaptive algorithms for IF estimation of multicomponent signals. 10.2.2 Analysis of noisy multicomponent signals There is a wide range of applications where we encounter signals comprised of I components with different IF laws fi (t) and different envelopes ai (t), in adM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 10.3 Robust time-frequency analysis 151 ditive noise. It is often desired from such an observed signal, to determine the IF law of each component. This can be achieved by representing the observed signal z(t) in time-frequency (t-f) domain and use time-frequency filtering methods to recover the individual components [Cohen(1995)], [Boashash(1991)]. Another approach involves extending parametric and non-parametric algorithms for IF estimation of monocomponent FM signals to the case of multicomponent signals and design an algorithm that simultaneously tracks the various IF components of the observed signal [Peleg et Friedlander(1996)], [Hussain et Boashash(2002)]. Both approaches require the use of time-frequency distributions (TFDs) with very specific properties such as high time-frequency localization of the instantaneous frequency components and high reduction of cross-terms interferences. In practice, the signal under consideration may be subjected to additive noise. In general, and for various reasons, the additive noise is assumed to be Gaussian. The analysis of non-stationary signals affected by additive Gaussian noise has been addressed in several places [Friedlander et Francos(1995)], [Barkat(2000)], [Barbarossa et Scaglione(2000)], [Barkat et Abed-Meraim(2004)], [Hussain(2002)]. However, in some situations, the assumption about the Gaussianity of the noise is not valid and, therefore, alternative techniques are needed in this case. 10.3 Robust time-frequency analysis In the presence of impulsive heavy-tailed noise, which is well-modeled by the family of alpha-stable distributions, time-frequency representation are severely corrupted by impulse-related artifacts, which tend to obscure the essential details of the desired signal. Recently, two novel techniques were proposed for the analysis of a monocomponent FM signal contaminated by additive noise having unknown heavy-tailed distribution : – First, a robust time-frequency distributions are developed as a generalization of the robust minimax M-estimates. In [Katkovnik(1998)], a robust periodogram was proposed for the analysis of a single tone affected by additive heavy-tailed noise. In [Katkovnik et al.(2002)], the authors used the so-called robust spectrogram and robust Winger-Ville distribution (WVD), respectively, to address the problem of non-stationary signals embedded in heavy-tailed noise. In [Barkat et Stankovic(2004)], the author extend the work proposed in [Katkovnik et al.(2002)] to design a robust polynomial WVD (PWVD). However, it is known that the spectrogram suffers from low resolution in the time-frequency domain, that the WVD suffers from the presence of artifacts for non-linearly frequency modulated signals, and that the PWVD suffer from the presence of cross-terms for multicomponent signals. – second, in [Griffith(1997)] the author used the fractional-lower order covariance, which is a correlation measure that is well-behaved in alpha-stable noise, have been developed a set of robust time-frequency representations that offer significant improvements in performance over conventional quadratic time-frequency representations. However, the use of fractional-lower order statistic in a time-frequency distribution has much expensive comM. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 152 State of the Art plexity computation. In addition there is no consistent estimator of covariation and lower-order covariance in the literature to use it in practice. In this third part of this thesis, we will propose two class of robust time-frequency procedures to analysis multicomponent non-stationary signals in heavy-tailed noise with α-stable model. The first one is based on a generalization of the work presented in [Barkat et Stankovic(2004)] to design robust time frequency distribution. The second one use a preprocessing stage as a first step to mitigate the effect of the impulsive noise before the time-frequency IF estimation step. 10.4 Concluding Remarks In this chapter, we described various approaches to IF estimation of nonstationary signals. Because of the discussed limitations of the existing methods, there was great interest in developing alternatives, especially for the multicomponent signals in impulsive noise environment. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 11 Robust Parametric Approaches In this chapter we address the problem of instantaneous frequency (IF) estimation of multicomponent nonstationary FM signals in impulsive α-stable noise environment. Three parametric techniques are introduced using a two-step procedure. The first step consists of transforming the polynomial phase estimation problem into a frequency estimation one using a phase-polynomial transform (PPT). In the second step, we perform the frequency estimation by three robust versions of the MUSIC (MUltiple Signal Classification) algorithm using truncated data (TRUNC-MUSIC), robust covariance estimate (ROCOV-MUSIC) and generalized covariation coefficients matrix which entries are a fractional lower order statistics of the signal (FLOS-MUSIC), respectively. We illustrate and compare the proposed methods by simulations examples. 11.1 Introduction-Problem Statement Many signals used in communications, radar, sonar, and other man-made signals, as well as various natural signals, involve phase modulation (FM) of a carrier. This model of FM signals was used in many references to define the notion of a multicomponent signal [Peleg et Friedlander(1996)], [Barbarossa(1995)]. Such complex signals can be affected by impulsive noise which can be modeled correctly by α-stable processes. In this work, we parameterize the model of an FM signal by assuming that the phase of the component is a polynomial function of time. Remark 11.1. We note that the estimation approach proposed in this work can be applied to multicomponent signals where the phase of some of the components is a continuous, but not necessarily polynomial, function of time. Indeed, via Weierstrass theorem we can approximate any continus function by a polynomial one. More 154 Robust Parametric Approaches about that can be found in [Peleg et Friedlander(1995)]. Without loss of generality, for a simple presentation we focus in this chapter on the case of a quadratic phase. If the considered signal is of phase order higher to two, we can reduce the order of the signal by demodulation. If the estimate of the higher-order polynomial phase coefficient is accurate, the highest order term is effectively removed, and we can proceed to use the same procedure PPT/demodulation to estimate the next phase parameter. This procedure will be repeated until all the coefficients have been estimated. Then, the signal model in the quadratic phase case is given by x(t) = I X si (t) + z0 (t) = i=1 I X ai (t) cos{φi (t)} + z0 (t) (11.1) i=1 where t = 0, . . . , N − 1, φi (t) = 2π(fi t + δi t2 ) + θi is the phase of the i-th so called chirp component. The parameters fi , δi , i = 1, . . . , I are unknown real coefficients. The values {θi , i = 1, · · · , I} are realizations of random variables, distributed uniformly and independently over [0, 2π). N is the sample size and I is the number of components of the observed signal. The amplitudes ai (t) are assumed α-stable, independent from the noise term z0 (t) and with location parameters ai 6= 0 and dispersions γi . The random noise z0 (t) is modeled as a symmetric α-stable process (SαS) with zero location parameter. Our primary interest is to estimate the instantaneous frequency IFi of each signal component si , defined as 4 IFi (t) = 1 dφi (t) = fi + 2δi t 2π dt (11.2) 1 By decomposing ai (t) = γiα ai,0 (t) + ai where ai,0 (t) being a standard (zero location parameter and unit dispersion) α-stable process, we can re-write the signal expression as x(t) = I X i=1 ai cos{φi (t)} + I X 1 γiα ai,0 (t)cos{φi (t)} + z0 (t) i=1 | {z (11.3) } z(t) = I X ai cos{φi (t)} + z(t) (11.4) i=1 According to the stability property of α-stable laws [Nikias et Shao(1995)], z(t) is an α-stable process . Thus, the problem of estimation (IFi )1≤i≤I of the multicomponent chirp signal affected by multiplicative and additive α-stable noise is reduced to that of estimating (IFi )1≤i≤I of a constant amplitude chirp signals, i.e. having the same IF laws as the original signals, but affected by the additive noise only. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 11.2 Polynomial-Phase Transform of FM Signals 11.2 155 Polynomial-Phase Transform of FM Signals Consider the polynomial phase estimation of the signal x(t) in Eq.(11.1). One possible solution to this problem is the maximum likelihood estimation algorithm. However, this estimation algorithm requires a large amount of computation. Indeed, the lack of explicit expression of the α-stable noise PDF forces us to use some existing approximation which turn out very expensive in numerical computation. Therefore, we propose to use, a much simpler, procedure which is based on the polynomial-phase transform (PPT) [Sahmoudi et al.(2003b)]. The PPT is a tool for analyzing constant-amplitude polynomial-phase signals [Peleg et Friedlander(1995)]. In the quadratic phase case, the PPT can simply performed as : y(t) = x(t + τ )x(t) = I1 X i=1 |2 |ai cos{2π(2τ δi t) + ϕi } + z 1 (t) 2 (11.5) (11.6) 2 where τ is the delay parameter (to choose preferably in [ N2 , 2N 3 ]), ϕi = 2π(τ fi +τ δi ) and z 1 (t) is the term of noise plus interferences.1 11.3 IF Estimation Procedure of FM Signals Now we apply one of the proposed algorithms in Section 11.4 to y(t) to estimate the parameters δi , i = 1, . . . , I1 . 2 In order to estimate the parameters fi , i = 1, . . . , I, we consider the demodulation of the signal as follows : For i = 1, . . . , I1 , we compute x(i) (t) = xa (t) exp(−j2π δ̂i t2 ) X ≈ exp{2jπ(fk t) + θi } + w(t) k∈Ji where Ji is the set of component indices with the same coefficient δi , δ̂i is the estimate of δi , xa (t) is the analytic signal of x(t), and w(t) represents noise plus interference. For each demodulated signal, we estimate the frequencies {fk , k ∈ Ji } using one of the proposed algorithms (see section III.2) applied to the real part of the demodulated signal <{x(i) (t)}. Note that it is not necessary to use a high resolution method in the case where Ji contains one single signal index. 1 Note that z 1 (t) is an impulsive noise but not necessarily SαS. We might have I1 < I in the case where certain chirp components of the signal have the same phase coefficients δi but different coefficients fi . 2 M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 156 11.4 Robust Parametric Approaches Robust Subspace Estimation In this section, we address the frequency estimation problem of multicomponent sinusoidal signals observed in impulsive noise environment given by equation (11.1) with φi (t) = 2πfi t + θi . We propose to apply the high resolution subspace algorithm MUSIC (Multiple Signal Classification) [Benidir(2002)] for the frequency estimation. As the performance of the standard MUSIC algorithm based on the sample covariance matrix degrades if the underlying noise is impulsive, we propose to apply MUSIC in the following three ways : 1. In the first one, we apply MUSIC to the truncated harmonic signal. 2. In the second one, we apply MUSIC to the generalized covariation function of the signal. 3. In the third one, we apply MUSIC to the minimax robust covariance estimate of the harmonic signal. 11.4.1 TRUNC-MUSIC algorithm In α-stable environment, the use of sample covariance is no longer appropriate for frequency estimation due to the infinite variance of the noise. To avoid this difficulty, we propose to truncate in amplitude the ‘large-valued’ observations that represent “large” impulsive noise realizations and apply MUSIC to the finite covariance matrix of the truncated process. TRUNC-MUSIC (TRUNC stands for truncation) can be summarized in Table 11.1. ' $ TRUNC-MUSIC Algorithm Step 1. Truncation constant choice : We propose to compute the histogram and choose K such that [−K, K] contains 90 % of the data. Step 2. Pre-Processing : We truncate the signal according to : x̃(t) = x(t) sign[x(t)]K if if | x(t) |≤ K | x(t) |> K Step 3. Frequency estimation : Apply MUSIC algorithm to the covariance matrix of the truncated signal x̃(t) & Tab. 11.1 – The proposed frequency estimation TRUNC-MUSIC algorithm. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires % 11.4 Robust Subspace Estimation 11.4.2 157 FLOS-MUSIC algorithm In this section we propose to use the fractional lower order statistics (FLOS) of the signal for the frequency estimation. We consider an L × L generalized covariation coefficient (GCC) matrix Γ, whose (n, l)-th entry is given by : Γn,l = E[x(n)x(l)<p−1> ] [x(n), x(l)]α = , [x(l), x(l)]α E[| x(l) |p ] 1≤p<α (11.7) where x<p−1> = |x|p−1 sign(x). It has been shown in [Altinkaya et al.(2002)] that for a sinusoidal signal in α-stable noise, we have : Γn,l = I X ηi cos{2πfi (n − l)} + Pz δn (11.8) i=1 where {ηi , i = 1, · · · , I} are positive real constants depending on α and ai , Pz is a real constant depending on noise pdf and δn is the Kronecker coefficient. Equation (11.8) shows that we can obtain the frequency estimates by applying MUSIC algorithm to the GCC-matrix Γ. In practice, we follow the following procedure summarized in Table 11.2. ' $ FLOS-MUSIC Algorithm Step 1. Compute an estimate of Γn,l (for p = 1) using [Altinkaya et al.(2002)] N −M X+1 Γ̂n,l = x(n + i − 1)sign(x(l + i − 1)) i=1 N −M X+1 (11.9) | x(l + i − 1) | i=1 h i Step 2. Apply MUSIC to the GCC matrix estimate Γ̂n,l for the frequency estimation. & 1≤n,l≤L % Tab. 11.2 – The proposed robust frequency estimation FLOS-MUSIC algorithm. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 158 11.4.3 Robust Parametric Approaches ROCOV-MUSIC algorithm [A]- Robust estimation of the covariance. Huber considered the parameter estimation problem in the presence of outliers or impulsive noise and proposed the concept of M-estimation statistics theory [Huber(1981)]. Here, we consider M-estimates for the signal auto-covariance func4 tion γ(k) = E[x(t + k)x(t)]. Note that the robust autocovariance estimation is equivalent to the robust variance estimation according to 1 E(XY ) = [V ar(X + Y ) − V ar(X − Y )] 4 (11.10) where V ar is the variance. For an α-stable distribution we have infinite variance, for that we propose to first truncate the observations using a large valued constant K À 1. The M-estimator of the variance σ 2 is a solution of the following equation [Huber(1981)] N −1 1 X x2 (i) u(d2i ) 2 − u(d2i ) = 0 (11.11) N σ i=0 x2 (i) σ2 where d2i = is the Mahalanobis quadratic distance and u is a weighting function defined in IR+ . The existence and uniqueness of the solution of Eq.(11.11) was shown in [Huber(1981)] under mild assumptions about the weighting function such as boundedness and continuity. This function is typically chosen such that observations coming from the tails of the assumed contaminated distribution are down-weighted. Here, we use the robust non-descending weighting function which is based on Huber’s minimax function given by u(d) = ω(d)/d with : ω(d) = min(d, k) (11.12) where k is a suitable constant [Huber(1981)]. We can compute the M-estimate of the variance as a solution of the latter equation. Then the needed covariance matrix is estimated through the auto-covariance coefficients as shown in Table 11.3. [B]- Robust frequency estimation Now, we apply the subspace approach to estimate the parameters of the sinusoidal signals. Thus, we outline the proposed algorithm ROCOV-MUSIC in Table 11.4. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 11.5 Performance Evaluation & Comparison 159 $ ' ROCOV Algorithm Step 1. Initialize the ROCOV by the standard P −1 algorithm 2 variance estimator : σ02 = N1 N x (i), i=0 Step 2. Sweep : At the (j + 1)th iteration compute N −1 X 2 σj+1 = 2 2 ωi,j x (i) i=0 N −1 X ; 2 −1 ωi,j i=0 ωi,j = u(di,j ) = ω(di,j )/di,j , d2i,j = x2 (i) , σj2 ω is the Huber non- descending function given above in (11.12). Step 3. According to Equation (11.10), compute the Mestimates γ̂(k), k = 0, . . . , L − 1 using the M-estimator of the variance of [x(t + k) + x(t)] and [x(t + k) − x(t)] computed in the above step 2. Step 4. Stop the sweeps when the error is smaller than a given threshold ². & % Tab. 11.3 – The proposed robust covariance estimation ROCOV algorithm. ' $ ROCOV-MUSIC Algorithm Step 1. Compute the M-estimates γ̂(k), k = 0, . . . , L−1 using the so called ROCOV algorithm summarized in Table 11.3. Step 2. Apply MUSIC to the robust covariance matrix estimate Γ̂x = T oeplitz[γ̂(k)0≤k≤L−1 ] for the frequency estimation. & Tab. 11.4 – The proposed frequency estimation ROCOV-MUSIC algorithm. 11.5 Performance Evaluation & Comparison 11.5.1 Mixture of sinusoidal component Here, we perform a comparison study by simulation of the proposed robust frequency estimation methods TRUNC-MUSIC, ROCOV-MUSIC and FLOM-MUSIC. For that, we consider three (I = 3) sinusoidal component with same amplitude M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires % 160 Robust Parametric Approaches a1 = a2 = a3 = 1 and frequencies f1 = 0.1, f2 = 0.3 and f3 = 0.4. We assume that the signal is affected by an impulsive noise with α-stable distribution. The characteristic exponent value is α = 1.5. We run 200 Monte-carlo realizations to compute all considered statistics. Figure 11.1 presents the mean square error (MSE) versus the noise dispersion in dB, (the considered sample size is N = 1000). In Figure 11.2, we present the MSE versus the sample size. The noise dispersion considered here is γ = 0.1. Comparaison des algorithmes TRUNC−MUSIC, ROCOV−MUSIC et FLOM−MUSIC −10 TRUNC−MUSIC ROCOV−MUSIC FLOM−MUSIC −20 −30 MSE −40 −50 −60 −70 −80 −90 −10 −8 −6 −4 −2 0 2 4 6 8 10 Dispersion en dB Fig. 11.1: The MSE versus the noise dispersion in dB, N=1000. Comparaison des algorithmes TRUNC−MUSIC, ROCOV−MUSIC et FLOM−MUSIC −30 FLOM−MUSIC TRUNC−MUSIC ROCOV−MUSIC −40 MSE −50 −60 −70 −80 −90 0 100 200 300 400 500 600 700 800 900 1000 Taille d’ echantillon Fig. 11.2: The MSE versus the sample size, γ = 0.1. These figures show the effectiveness of the proposed methods. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 11.5 Performance Evaluation & Comparison 11.5.2 161 Mixture of two chirps In this subsection, we conduct three experiments to illustrate the proposed procedure of IF estimation (or equivalently parameter phase estimation). In the first one we use the TRUNC-MUSIC algorithm in the second step of our proposed approach as introduced in Section 11.3. While in the second one, we use the ROCOV-MUSIC algorithm and the FLOS-MUSIC algorithm in the third experiment. For that we consider two linear FM3 component mixtures (I = 2) with same amplitudes a1 = a2 = 1, frequencies f1 = 0.05, f2 = 0.3 and second order parameters δ1 = 0.0001 and δ2 = 0.0003. We suppose that the signal is affected by an impulsive noise with α-stable model (α = 1.5). We run 500 Monte-carlo realizations to compute all evaluated statistics. Chirp 1 Chirp 2 −10 −20 −15 −25 −20 MSE de f 1 MSE de f2 −15 −30 −35 −35 −45 −40 1000 1500 Taille de l’échantillon −45 500 2000 −70 100 −80 50 −90 MSE de δ2 1 −30 −40 −50 500 MSE de δ −25 −100 −110 1000 1500 Taille d’ échantillon 2000 1000 1500 Taille de l’échantillon 2000 0 −50 −100 −120 −130 500 1000 1500 2000 Taille de l’échantillon −150 500 Fig. 11.3: The MSE versus the sample size, γ = 0.1. Figures 11.3 and 11.4 show the MSE of the estimated phase parameters versus the sample size and the noise dispersion respectively. The three proposed techniques are compared using the same legend as in the previous figures 11.1 and 11.2. These simulation examples show the effectiveness of the proposed methods to mitigate the impulsive noise. The comparative study shows clearly certain advantage for the ROCOV-MUSIC based procedure. 3 LFM called also commonly chirp signals. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 162 Robust Parametric Approaches Chirp 2 −10 −10 −15 −15 −20 MSE f2 MSE f1 Chirp 1 −5 −20 −25 −30 −5 0 Dispersion 5 −35 −10 10 −78 −75 −80 −80 −82 −85 −84 −86 −90 −105 0 Dispersion 5 10 5 10 −95 −100 −5 0 −90 −88 −92 −10 −5 Dispersion MSE δ2 MSE δ1 −30 −10 −25 5 10 −110 −10 −5 0 Dispersion Fig. 11.4: The MSE versus the noise dispersion in dB, N=1000. 11.6 Concluding Remarks In this chapter, three two-step methods for IF estimation in heavy-tailed noise are introduced. The first step consists of transforming the polynomial phase estimation into a frequency estimation problem one. The frequency estimation in the second step of the proposed parametric methods is based on the use of the subspace MUSIC algorithm, applied respectively, on the amplitude truncated signal, on the robust covariance matrix and on the generalized covariation coefficients matrix. Simulation results are presented to validate our IF estimation methods. In the considered simulation context, the comparative study shows the superiority of the parametric method using the robust covariance estimation technique (ROCOV-MUSIC). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires Chapitre 12 Robust Time-Frequency Approaches As shown in this chapter, the conventional TFDs are quite sensitive with respect to non-Gaussian noise, in particular, to impulsive noise in which case they produce poor estimation results. In order to get a good estimation performance in this context, we propose in a first approach a preprocessing stage of the signal to attenuate the impulsive noise effect before processing the signal TFD. In the second approach, we use the robust statistics theory to define a new robust TFD, so named robust MB-distribution (MB-distribution : Modified B-distribution). We show that the resulting TFD from the two proposed approaches is able to reveal the instantaneous frequency of the noisy multicomponent signal in an accurate way. 12.1 Introduction-Problem Statement This chapter is concerned with the analysis of multi-component FM signals, corrupted by additive heavy-tailed noise. A multi-component signal means a signal whose time-frequency representation presents multiple ridges in the time-frequency plane. • Signal model Analytically, the noisy signal considered in this chapter is defined as, x(t) = s(t) + z(t) = M X si (t) + z(t) (12.1) i=1 where each component si (t), of the form si (t) = ai (t) ejφi (t) , is assumed to have only one ridge, or one continuous curve, in the time-frequency plane. ai (t) is the amplitude and φi (t) denotes the phase of the ith component of the signal. The 164 Robust Time-Frequency Approaches probability density function (PDF) of the random impulsive noise z(t) is modeled as a heavy-tailed distribution1 . Examples of this kind of distributions include αstable with α < 2 and generalized Gaussian laws. • Symmetric α-stable process (SαS) : As it has been presented in the first part of this thesis, the PDF of SαS processes does not have closed form except for the cases α = 1 (Cauchy distribution), α = 2 (Gaussian distribution) and α = 1/2 (Levy distribution). Due to their heavy tails, stable distributions do not have finite second or higher-order moments, except for the limiting case of α = 2. • Generalized Gaussian (GG) PDF : Another way to model impulsive noise processes is through the generalized Gaussian PDF given by fα (x) = A exp (−b|x|α ) where 0 < α < 2 . For α = 2 we have the Gaussian distribution and for α = 1 we have the Laplacian distribution which is known to be a good model for impulsive noise. • Time-frequency analysis : Our primary interest, in this work, is to estimate the instantaneous frequency of each FM signal si (t) of (12.1), defined as 4 IFi (t) = 1 dφi (t) 2π dt (12.2) Time-frequency analysis techniques are used here as they reveal the multicomponent nature of such signals. Ideally, for a given FM signal, the TFD is represented as a row of delta functions around the signals instantaneous frequency. This property makes the peak of the TFD a very powerful tool as an IF estimator. However, quadratic TFD of multi-component signals suffer from the presence of cross-terms, which can obscure the real features of interest in the signal. The properties of a quadratic TFD are completely determined by its kernel. This kernel should have the shape of a two-dimensional (2-D) low-pass filter to attenuate the cross-terms that exist away from the origin in the ambiguity domain and preserve the auto-terms that concentrate around the origin of this domain [Hussain et Boashash(2002)]. Considerable efforts have been made to define TFDs that reduce the effect of cross terms while improving the time-frequency resolution (e.g., [Hussain et Boashash(2002), Barkat et Abed-Meraim(2004)]). This led to the so-called reduced interference distributions that include the modified Bdistribution (MBD), and the signal-dependent optimal time-frequency representation. In this work, we have used the MBD [Hussain et Boashash(2002)] given by : Z Z +∞ T (t, f ) = −∞ τ τ GσM B (t0 )[x(t − t0 + )x∗ (t − t0 − )]e−j2πf τ dt0 dτ 2 2 (12.3) kσ where GσM B (t0 ) = cosh(t 0 )2σ , 0 ≤ σ ≤ 1 is a real parameter that controls the tradeoff between component’s resolution and cross-terms suppression and kσ = Γ(2σ)/(22σ−1 )Γ2 (σ) is the normalizing factor. The choice of the MBD, stems from the fact that it presents a good performance in terms of resolution and crossterms suppression [Hussain et Boashash(2002)]. The effect of additive Gaussian 1 For complex-valued noise signal, we simply consider that z(t) = zr (t) + jzi (t) where zr (t) and zi (t) represent two independent heavy-tailed processes with a same pdf function. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.2 Failure of Standard TFD in Impulsive Noise 165 noise on the time-frequency representation is another consideration that has direct influence on the instantaneous frequency estimation with an important issue [Peleg et Friedlander(1996)], [Hussain et Boashash(2002)]. However, in many practical applications, especially in communications, signals are disturbed by impulsive noise due to the propagation environment or to large errors in collecting and recording the data. These noise processes are commonly modeled by heavy-tailed distribution [Nikias et Shao(1995)]. Since outliers or impulsive noise have an unusually great influence on standard IF estimators, robust procedures attempt to modify those schemes. Only a limited literature was dedicated to the analysis of multi-component FM signals in impulsive noise. In [Sahmoudi et al.(2004b)], authors propose a class of parametric robust methods to handle linear FM signals. In the same paper [Sahmoudi et al.(2004b)] a TFDbased technique has been proposed using a pre-processing stage to mitigate the impulsive noise effect. The other alternative, which is the focus of this chapter, is to apply the M-estimation principle in order to design robust TFD with respect to impulsive noise. In [Katkovnik et al.(2003)] and [Barkat et Stankovic(2004)], the authors proposed the robust spectrogram and the robust polynomial Wigner-Ville distribution (PWVD), respectively. However, it is known that the spectrogram suffers from low resolution in time-frequency domain, while the PWVD suffers from cross-terms for multi-component signals. In this chapter, we use the modified B-distribution [Hussain et Boashash(2002)] and the M-estimation theory to design a new robust TFD, which is referred to as the robust modified B-distribution (R-MBD), that is used for the analysis of multi-component FM signals in heavytailed noise. We show that the proposed approach can solve problems that existing time-frequency distributions cannot. 12.2 Failure of Standard TFD in Impulsive Noise 12.2.1 Effect of impulsive spike noise on TFD To examine the effect of additive impulsive noise on the time-frequency representation of a signal, it is useful to use a model which is simple and provides considerable insight into the nature of the artifacts that appear. We will carry out our analysis in discrete time. For a clear and simple illustration, let us consider the spike model for impulsive noise. The signal to be examined is x(n) = s(n) + AδK (n − n0 ) (12.4) where δK (n) is the Kronecker delta function, PAδK (n−n0 ) represent the spike noise model and A À Es is its amplitude. Es = n | s(n) |2 is the energy of the signal M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 166 Robust Time-Frequency Approaches s(n). If we compute for example the WVD of x(n), we get Wx (n, f ) = 2 X {s(n + m) + AδK (n + m − n0 )} {s∗ (n − m) + A∗ δK (n − m − n0 )} e−j4πmf m = Ws (n, f ) + 2A∗ +2A X X s(n + m)δK (n − m − n0 )e−j4πmf m s∗ (n − m)δK (n + m − n0 )e−j4πmf m +2 | A |2 X δK (n + m − n0 )δK (n − m − n0 )e−j4πmf m = n o Ws (n, f ) + Real A∗ s(2n − n0 )e−j4π(n−n0 )f + 2 | A |2 δK (n − n0 ) where Real(z) denotes the real part of the complex number z. The effect of this single impulse at n = n0 is to place a very strong impulsive ridge with magnitude 2 | A |2 located in the time frequency plane and extending over all frequencies. In addition, there is a secondary artifact that is the result of the cross-product of the signal s(n) with the impulse. This artifact is a decimated copy of s(n) extending over all frequencies and modulated in the normalized frequency domain by the complex exponential term exp(−4jπ(n − n0 )). This additive cross term will oscillate more rapidly in the normalized frequency domain for values of n that are further removed from n0 . 12.2.2 Effect of impulsive α-stable noise on TFD Because alpha-stable noise is impulsive, its effect on quadratic time-frequency representation (QTFR) is different from the effect that is observed in the Gaussian case (α = 2). In that case, the energy of the noise is uniformly spread over timefrequency plane. This can be seen by examining the autocorrelation function of the observed signal x(n) given by Rx (n, m) = IE {x(n + m)x∗ (n − m)} (12.5) One can show that the time-frequency of a signal s(n) in additive alpha-stable noise is severely degraded. Indeed, let z(n) denote the α-stable additive noise with α < 2, then for m 6= 0, Rx (n, m) = s(n + m)s∗ (n − m) + s(n + m)IE {z ∗ (n − m)} ∗ (12.6) ∗ +s (n − m)IE {z(n + m)} + IE {z(n + m)z (n − m)} , (12.7) and for m = 0, Rx (n, 0) =| s(n) |2 +s(n)IE{z ∗ (n)} + s∗ (n)IE{z(n)} + IE{| z(n) |2 } (12.8) Since IE{z(n)} is infinite when α ≤ 1, then all elements of the autocorrelation matrix are infinite. Also, since IE{| z(n) |2 } is infinite when α < 2 we have Rx (n, 0) = ∞ for all n. Thus the autocorrelation blows up for α < 2, making the standard time-frequency useless for characterizing signal in impulsive environments. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.3 Pre-processing Techniques based Approach 12.2.3 167 The need of robust TFD in Gaussian environment Here, we suppose that the noise z(n) is a complex Gaussian random process and we examine the instantaneous autocorrelation function of the observed signal x(n). One can write the noise PDF as pz (z) = 1 1 exp(− 2 | z |2 ) 2 πσ πσ (12.9) We can express the instantaneous autocorrelation of the observed signal as Rx (n, m) = sx (n, m) + z1 (n, m) + z2 (n, m) + Rz (n, m) (12.10) Clearly the term sx (n, m) is deterministic, the two terms z1 (n, m) and z2 (n, m) represent complex random variables. To analyse the instantaneous autocorrelation behavior, one must analysis the probability distribution of the final term Rz (n, m). Using the PDF formula of functions of r.v.s [Benidir(2002)], we have pRz ∝ pz (h−1 (y)) −1 ∝ exp(−c | h (12.11) 2 (y) | ) ∝ exp(−c | y |) (12.12) (12.13) where h is the considered transform z 7→ h(z) = zz ∗ . Thus the instantaneous autocorrelation has a Laplace PDF which has heavy tails. Then, computation of a QTFD of a signal in Gaussian noise generate an impulsive noise. Consequently, robust time-frequency analysis is necessary also in Gaussin environment. 12.3 Pre-processing Techniques based Approach The first step consists in reducing the impulsive noise amplitudes in order to improve the quality of the TFD of the considered noisy signal. To do so, two solutions might be suggested. 12.3.1 Exponential compressor filter We propose here to pass the noisy signal through a nonlinear device that compresses the large amplitudes (i.e., reduces the dynamic range of the noisy signal) before further analysis [Barkat et Abed-Meraim(2003b)]. The output of the nonlinear device, is expressed as x̃(t) = ψβ [x(t)] = |x(t)|β sign[x(t)] where 0 < β ≤ 1 is a real coefficient that controls the amount of compression applied to the input noisy signal x(t). This technique is similar to that used in nonuniform quantization where a totally different nonlinear law is used [Jayant et Noll(1984)]. A plot of this compressor law is displayed in Figure 12.1 for different values of β. Observe that the compressor law is linear around the origin (i.e., very small input values). The linearity and its corresponding interval range, obviously, depend M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 168 Robust Time-Frequency Approaches 2 β=0.1 β=0.5 β=0.9 1.5 Compressor output 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 Compressor input 1 1.5 2 Fig. 12.1: The nonlinear law of the compressor used in the pre-processing stage. on the value of β. The smaller is β, the smaller is the linearity range. This means, that for weak signals (i.e., the noiseless signal amplitude is small enough compared to the noise spikes), and using an appropriate value of β, the compressor output signal may be approximated by a scaled version of the input noiseless signal embedded in a new additive noise whose variance is much smaller than the input noise variance. Figure 12.2 displays the time representation of a linear FM in impulsive noise compressed using β = 1 (i.e., no compression) β = 0.9, β = 0.5 and β = 0.2, respectively. If we assume the effect of the compressor on the desirable noiseless signal characteristics (i.e., its IF) to be negligible, then the achieved reduction in the noisy signal variance will yield better results in its analysis. 10 10 β=0.9 β=1 5 Amplitude Amplitude 5 0 0 −5 −5 −10 0 100 200 300 Time [samples] 400 500 −10 4 0 100 500 400 500 β=0.2 1 Amplitude 2 Amplitude 400 2 β=0.5 0 −2 −4 200 300 Time [samples] 0 −1 0 100 200 300 Time [samples] 400 500 −2 0 100 200 300 Time [samples] Fig. 12.2: Compression of a linear FM signal in impulsive noise using different values of β. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.4 Robust Time-Frequency Approach 12.3.2 169 Huber filter We use here the Huber criterion to define the Huber filter which truncates in amplitude the ‘large-valued’ observations that represent “large” impulsive noise realizations. For the choice of the truncation constant K, we propose to compute the histogram of observations and choose K such that [−K, K] contains 90 % of the data ; then, the output of the Huber linear filter, is expressed as : ½ x(t) if | x(t) |≤ K x̃(t) = ψH [x(t)] = sign[x(t)]K if | x(t) |> K The second step consists in applying a time-frequency analysis presented in section Section 12.5 to the processed signal x̃(t) for the IF problem estimation. 12.4 Robust Time-Frequency Approach In order to get a good TFD-based IF estimation performance in impulsive environment, we use the robust statistics theory of M-estimation to define a new robust quadratic time-frequancy distribution. 12.4.1 Optimal TFD kernel in α-stable noise Recall that the fractional lower order moments (FLOMs) of an alpha-stable random variable with zero location parameter and dispersion γ are given by E|X|p = p C(p, α)γ α for 0 < p < α where C(p, α) is a constant depending only on p and α. This tells us that the pth order moment of an α-stable random variable and its dispersion are related through only a constant. Therefore, the MD criterion is equivalent to least Lp -norm estimation where 0 < p < α and the estimates of a parameter θ can be obtained from equation (3.18) using the Lp -norm loss function ρp (x) = |x|p ; ψp (x) = sign(x)p|x|p−1 (12.14) 4 (sign(x) = x/|x|) as a tool of the robust estimation which appears originally as a heuristic idea supported latter by theoretical and experimental studies. In particular, for p = 1, L1 -norm criterion referred to as “modulus function” was used in [Katkovnik et al.(2003), Barkat et Stankovic(2004)] to define the robust periodogram and robust PWV distributions. It should be emphasized that the least Lp -norm estimates are not only optimal in a MD sense for α-stable data, but also optimal in the maximum likelihood sense for the family of generalized Gaussian distribution. Indeed, the ML estimator coincides with the Lp -norm criterion choosing p as the same value of the index p in the generalized Gaussian PDF. In addition, applying equations (3.25) and (3.26) over the class of generalized Gaussian pdf we can show easily that the least Lp -norm estimates is optimal also in the robust minimax sense if we choose p as the smallest value in the considered set of p values. It is recognized that “outliers” which arise from heavy tailed distribution noise or are simply bad points due to measurement errors, have an unusually large influence on the standards estimators based on least squares estimators. Accordingly, as mentioned previously robust methods have been developed to modify least M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 170 Robust Time-Frequency Approaches squares schemes so that the outliers have much less influence on the final estimates. One of the most satisfying robust procedures is that given by a modification of the principle of maximum likelihood ; hence we proceed with that approach called M-estimation [Huber(1981)]. 12.4.2 A new robust quadratic time-frequency distribution Let consider the noisy signal (12.1) in discrete-time x(kT ) = s(kT ) + z(kT ) where T is a sampling period. A standard time-frequency distribution, at a point (kT, f ), is shown to be a solution of the optimization problem [Katkovnik et al.(2003)] B̂ = arg min J (kT, f, B) B where J (kT, f, B) = N/2 X w(nT )ρ[e(k, f, n)], (12.15) (12.16) n=−N/2 e(k, f, n) = Gx (kT, nT )e−j2πf nT − B where w(nT ) is a window function, Gx (kT, nT ) being the kernel of the considered quadratic time-frequency distribution of the FM signal x(kT ) and B is an estimate of the expectation of the sample average of the quantity G(kT, nT )e−j2πf nT . If we choose the loss function ρ(e) = |e|2 , we can show by solving for B the expression dJ (kT,f,B) = 0 that the optimal solution corresponds to the standard TFD dB∗ Bxs (kT, f ) = N/2 X w(nT ) Gx (kT, nT )e−jπf nT PN/2 n=−N/2 n=−N/2 w(nT ) (12.17) Thus, for a weighted window, the standard TFD can be treated as an estimate of the mean, calculated over a set of complex-valued observations G = {Gx (kT, nT )e−jπf nT ; n ∈ [−N/2, N/2]} It has been shown that the optimal loss function ρ derived in the minimax Huber’s estimation theory (see section 2) could be applied to the design of a new class of robust time-frequency distributions, inheriting properties of strong resistance to impulsive noise. In particular, some robust TFDs have been derived by using the absolute error loss function ρ(e) = |e| in (12.16) [Katkovnik et al.(2003)]. In this work, we propose to choose the loss function ρ in the criterion (12.16) as the Lp -norm criterion ρ(e) = |e|p wherep < 2 is a parameter to control the exponential loss function degree. The choice of this optimal criterion is well motivated in section 12.4.1. In this work, we use the MBD to handle multi-component nonstationary FM signals given by model (12.1). However, similarly to the standard spectrogram, WVD and PWVD, the standard MB-distribution is not an adequate analysis tool in presence of heavy-tailed noise. To mitigate this problem, we use the MB-distribution kernel given in Equation (12.3) and the Lp norm loss function in the design of the proposed robust MBD to analyze FM signals affected M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.4 Robust Time-Frequency Approach 171 by impulsive noise. In this case, we find the optimal solution, labelled the robust modified B-distribution (R-MBD), to be N/2 X ∂ −jπf nT p w(nT )|G (kT, nT )e − B| =0 (12.18) x ∂B∗ n=−N/2 ⇐⇒ N/2 X w(nT )(Gx (kT, nT )e−jπf nT − B)|Gx (kT, nT )e−jπf nT − B|p−2 = 0 (12.19) n=−N/2 ⇐⇒ Bxr (kT, f ) = N/2 X n=−N/2 d(k, f, n) Gx (kT, nT )e−jπf nT , D0 (kT, f ) d(k, f, n) = w(nT )|Gx (kT, nT )e−jπf nT − Bxr (kT, f )|p−2 , D0 (kT, f ) = N/2 X d(k, f, n) (12.20) (12.21) (12.22) n=−N/2 Since, the quantity Bxr (kT, f ) appears on the right as well as on the left hand side of Equation (12.20), an iterative procedure is necessary in order to obtain the R-MBD. The robust-MBD algorithm can be summarized as follows in Table 12.1. It was shown in [Kaluri et Arce(2000)] that the above iterative algorithm will $ ' Robust-MBD Computation Step 1. Evaluate the standard MBD using equation (12.17) Step 2. For initialization purposes, set the iteration index i = 0 and Bxr 0 (kT, f ) = Bxs (kT, f ) Step 3. Sweep. Set i = i + 1 and do : – Compute d(k, f, n) and D0 (kT, f ) using equations (12.21) and (12.22) respectively. – Compute the robust MBD, for iteration i, Bxr i (kT, f ) using Equation (12.20). Step 4. If the relative absolute difference between two iterations is smaller than a fixed threshold ε, i.e. |Bxr i (kT, f ) − Bxr i−1 (kT, f )|/|Bxr i (kT, f )| ≤ ε then stop the algorithm. Otherwise go to Step 3. & Tab. 12.1 – Computation procedure of the Robust-MBD. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires % 172 Robust Time-Frequency Approaches converge to a single (global) minimum under a good choice of the initial value. In our case, the choice of Bxr 0 (kT, f ) = Bxs (kT, f ) satisfies the necessary condition of the convergence. 12.5 IF Estimation & Component Separation The proposed component separation procedure algorithm consists of separating the signal components and estimating their respective IF laws from the signal TFD. In impulsive environment, we propose to apply this algorithm to (i) : the TFD of the pre-processed signal in the first procedure and (ii) to the robust MBdistribution of the noisy signal in the second procedure. This proposed component separation algorithm is illustrated in Table 12.5. The first step of the algorithm consists in noise thresholding to remove the undesired ’low’ energy peaks in the time-frequency domain. This operation can be written as : ½ Tth (t, f ) = T (t, f ) if T (t, f ) > ² 0 otherwise where ² is a properly chosen threshold. In our simulations we used ² = 0.01 max T (t, f ). (t,f ) Assuming a ‘clean’ TFD, the M components IF are estimated, at each time instant t, from the M peak positions of the TFD slice Tth (t, f ). Let observe that if, at a time instant t0 , two components are crossing then the number of peaks (at this particular slice T (t0 , f )) is smaller than the total number of components M . For practical implementation reasons, we decide that a crossing occurs when the number of peaks is smaller than M over a fixed number of consecutive slices. In this case, we implement the following procedure : 1. Choose a particular maximum point location in the slice where the crossing occurs. 2. Measure all distances from this point to the peaks locations of the previous slice (with no crossing). 3. Select the 2 smallest distances and add them. 4. Repeat Steps 1 to 3 for all other maximum point locations in the slice where the crossing occurred. 5. From the set of the smallest sums found above, the program selects the smallest value and the points associated to them. This will yield the location where the crossing occurred and the 2 components involved in the crossing. Then, we use a simple numerical permutation operation of the 2 components involved in the crossing. The details of the proposed separation technique is outlined in Table 12.5. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.6 Performance Evaluation & Comparison 173 ' $ Time-Frequency based Component Separation Algorithm 1. Assign an index to each of the M components in an orderly manner. 2. For each time instant t (starting from t = 1), find the components frequencies as the peaks positions of the TFD slice T (t, f ). 3. Assign a peak to a particular component based on the smallest distance to the peaks of the previous slice T (t − 1, f ) (the IFs are continuous functions of time). For the special case of a crossing point (see Step 4 how to detect it and its corresponding components), we assign the peak to both crossing components. 4. If at a time instant t a crossing point exists (i.e., number of peak smaller than the number of components), identify the crossing components using the smallest distance criterion by comparing the distances of the actual peaks to those of the previous slice. 5. Permute the indices of the corresponding crossing components. & % Tab. 12.2 – Component separation procedure for the proposed algorithm 12.6 Performance Evaluation & Comparison The estimation performance is measured by the normalized MSE defined by Nr 1 X kθ̂r − θk2 N M SE = Nr kθk2 r=1 where θ is the considered parameter, θ̂r is the estimate of θ at the rth experiment and Nr is the number of Monte-Carlo runs chosen here equal to 500. [B]- First experiment To check the validity and superiority of the proposed algorithm, we consider the time-frequency representation of a three-component FM signal corrupted by an impulsive noise modeled as a generalized Gaussian distribution with α = 1.5 . The standard MBD, displayed in Fig. 12.3. yields a poor representation ; while, the R-MBD displayed in Fig. 12.4. reveals clearly the features of the noisy signal. The superiority of the R-MBD over the standard MBD is obvious in this example. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 174 Robust Time-Frequency Approaches Fs=1Hz N=512 Time−res=1 500 450 400 Time (seconds) 350 300 250 200 150 100 50 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 Fig. 12.3: The standard MBD of the multi-component signal test. Fs=1Hz N=512 Time−res=1 500 450 400 Time (seconds) 350 300 250 200 150 100 50 0.05 0.1 0.15 0.2 0.25 0.3 Frequency (Hz) 0.35 0.4 0.45 Fig. 12.4: The Robust-MBD of the multi-component signal test. [A]- Second experiment In this experiment, we consider a discrete-time multicomponent FM signal consisting of two linear FM components embedded in additive impulsive noise x(n) = s1 (n) + s2 (n) + z(n) n = 0, 1, . . . , N − 1 where s1 (n) = exp{j2π(a1 n + b1 n2 )} and s2 (n) = exp{j2π(a2 n + b2 n2 )}. The noise z(n) is chosen to be α-stable with zero location parameter, characteristic exponent α = 1 and dispersion equal to γ = 1. The signals’ IF coefficients are given by a1 = 0.2, b1 = 0.1 ∗ 10−3 , a2 = 0.45 and b2 = −1.5 ∗ 10−3 . In the first step, we perform a pre-processing of the noisy signal to mitigate the impulsive noise using the exponential compressor filter exp-TFD algorithm (with parameter β = 0.1) and in the Huber-TFD algorithm. In the second step, we put the pre-processed signal x̃(n) through the proposed algorithm (we chose σ = 0.01 for the MB-distribution kernel) in order to extract the two respective components. The peaks of the extracted components (in the time-frequency domain) are, then, used to estimate the IFs of the chirps. We use a simple polynomial fit to obtain estimates of (a1 , b1 ) from IF1 (n) and estimates of (a2 , b2 ) from IF2 (n). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.6 Performance Evaluation & Comparison 175 The same noisy signal x(n) is also put through the R-MBD (with β = 1) algorithm developed in this work to validate this method and to compare it with the preprocessing-based methods. In Fig. 12.5, we display the NMSE of the RMBD, exp-TFD and Huber-TFD versus the sample size. Chirp2 −20 −30 −25 NMSE b2 NMSE b 1 Chirp 1 −25 −35 −40 −45 −50 −30 −35 −40 −45 0 500 1000 1500 −50 2000 0 500 Sample size −25 2000 1500 2000 −25 NMSE a2 NMSE a1 1500 −20 −30 −35 R−MBD −40 −45 1000 Sample size exp−TFD Huber−TFD 0 500 1000 Sample size −30 −35 −40 1500 2000 −45 0 500 1000 Sample size Fig. 12.5: The NMSE versus sample size : a comparative study. These simulations confirm the effectiveness of the proposed algorithms and, at least in this simulation context, the best results in terms of estimation accuracy are obtained by the R-MBD algorithm (which is, on the other hand, the most expensive one) followed by the exp-TFD method. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 176 Robust Time-Frequency Approaches [B]- Third experiment Here, we assess the statistical performance of the R-MBD based IF estimator of multi-component FM signals. For that let us consider a two linear FM components embedded in additive impulsive α-stable noise z(t) modeled as x(t) = s1 (t) + s2 (t) + z(t) where s1 (t) = exp{j2π(a1 t + b1 t2 )} and s2 (t) = exp{j2π(a2 t + b2 t2 )}. The noise z(t) is chosen with zero location parameter, characteristic exponent α = 1 and dispersion γ. The signals IF coefficients are given by a1 = 0.2, b1 = 0.1 ∗ 10−3 , a2 = 0.45 and b2 = −1.5 ∗ 10−3 . To validate the proposed method and to compare it with some existing methods, we implement the following procedure : 1. Compute the TFD of the two chirp components signal in α-stable noise x(t) using r-PWVD [Barkat et Stankovic(2004)] and the proposed R-MBD. For that, we choose σ = 0.01 for the MBD kernel and p = α/3 for the fractional Lp -norm loss function to design the R-MBD. In the experiments, we fix the signal length equal to N = 501 and the window length, used in the r-PWVD implementation, equal to 101 samples. 2. Put the computed TFD matrix through the component separation algorithm in order to extract the two respective components. The peaks of the extracted components (in the time-frequency domain) are, then, used to estimate the IFs of the chirps. 3. Put the same noisy signal through one of the widely used IF estimation methods which is the High Ambiguity Function (HAF) algorithm to estimate the four chirp parameters a1 , b1 , a2 and b2 [Peleg et Friedlander(1996)]. 4. For the HAF algorithm, use a simple polynomial fit to obtain estimates of IF1 (t) from (a1 , b1 ) and estimates of IF2 (t) from (a2 , b2 ). In Fig. 12.6., we display the NMSE of the IF estimate versus the noise dispersion γ for HAF, r-PWVD and R-MBD. The accuracy and superiority of the R-MBD over both algorithms r-PWVD and HAF is evident. Chirp 1 Chirp 2 −5 −5 HAF HAF −10 −10 −15 −15 r−PWVD NMSE of IF estimates in dB NMSE of IF estimates in dB r−PWVD −20 R−MBD −25 −30 −20 R−MBD −25 −30 −35 −35 −40 −40 −45 −5 −45 0 5 10 −10*log10(γ) 15 20 −50 −5 0 5 10 −10*log10(γ) 15 20 Fig. 12.6: NMSE of IF estimates, corresponding to the HAF, r-PWVD and the R-MBD for a noisy two-component chirp signal. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 12.6 Performance Evaluation & Comparison 177 [D]- Fourth experiment In this experiment, a comparative study of the previous IF estimation methods of multicomponent chirp signal is addressed. For this purpose, we consider a mixture of two chirp components of the same amplitude a1 = a2 = 1, with f1 = 0.05, f2 = 0.3, δ1 = 0.0001 and δ2 = 0.0003 embedded in impulsive α-stable noise with parameter exponent α = 1. For the non-parametric TFD-based method, we propose the compressing technique with parameter β = 0.1 (we chose σ = 0.01 for the MB-distribution kernel). Figures 12.7 and 12.8 represent the NMSE of the phase parameters versus the sample size and the noise dispersion, respectively. In this simulation context, the best results are obtained by the time-frequency based method followed by the parametric method based on robust covariance estimation (ROCOV-MUSIC). Chirp 1 Chirp 2 0 0 −10 NMSE2f 1 NMSE f −20 −40 −60 −80 −20 −30 −40 0 200 400 600 800 1000 −50 0 200 400 600 800 1000 800 1000 Sample size Sample size −25 −10 −15 NMSE delta 2 −20 1 NMSE delta −30 −25 −35 −30 TFD ROCOV−MUSIC TRUNC−MUSIC FLOS−MUSIC −40 −45 0 200 400 600 −35 −40 800 1000 −45 0 200 400 600 Sample size Sample size Fig. 12.7: Normalized MSE of the various phase parameters versus sample size, γ = 0.1. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 178 Robust Time-Frequency Approaches Chirp 2 −20 −20 −30 −30 −40 NMSE2f NMSE1f Chirp 1 −10 −40 −50 −60 −5 −60 0 5 10 Dispersion in dB 15 20 −20 −70 −5 0 5 10 Dispersion in dB 15 20 0 5 10 Dispersion in dB 15 20 −35 −40 1 NMSE delta 2 −30 NMSE delta −50 −45 −40 −50 −50 −60 −5 −55 0 5 10 15 20 −60 −5 Dispersion in dB Fig. 12.8: Normalized MSE of the various phase parameters versus noise dispersion in dB, N=1000. 12.7 Concluding Remarks In this chapter, we proposed a new approach to analysis multicomponent nonstationary FM signals corrupted by additive heavy-tailed noise using the robust statistics theory. Two different procedures are proposed : • Robust preprocessing approach : In this part, a preprocessing stage, based on the use of M-estimation idea, have been proposed to clean the timefrequency image. This first step allow us to get a good time-frequency representation thai it is essential in the second IF estimation step. • Robust time-frequency distribution approach : In this part, the fractional Lp -norm (0 < p < α) loss function have used in the M-estimation framework to design a new robust TFD referred to as R-MBD. The proposed R-MBD is robust to the effect of heavy-tailed α-stable noise. Computer simulations confirm the effectiveness of the proposed algorithms and that the best results in terms of estimation accuracy are obtained by the R-MBD based algorithm which is, on the other hand, the most expensive one followed by the r-PWVD based method. In the considered simulation context, the comparative study shows the superiority of the non-parametric (TFD-based) method and the parametric method using the robust covariance estimation technique (ROCOV-MUSIC). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 179 Chapitre 13 Conclusions et Perspectives De la statistique à variance infinie à la séparation de sources et au traitement des signaux non-stationnaires, nous avons souhaité explorer des domaines moins connus des praticiens mais qui peuvent encore révéler des spécificités théoriques intéressantes et trouver des applications nouvelles. 13.1 Conclusion Générale En guise de conclusion générale, nous allons tenter d’établir une synthèse globale sur le travail qui a été réalisé dans cette thèse. Nous avons, grâce à l’utilisation d’outils mathématiques probabilistes et statistiques, tenté d’apporter de nouvelles pierres à deux édifices parmi les plus importants du traitement du signal : la séparation de sources non-gaussiennes et l’estimation des signaux non-stationnaires. Nous avons donc répondu aux interrogations que l’on avait soulevées en débutant ce travail de thèse, même si ce n’est que partiellement, puisque beaucoup de pistes prometteuses n’ont été que soulevées. Mais avouons-le, une étude exhaustive des différents intérêts des distributions α-stables en traitement du signal représente un travail de longue haleine, et les questions ouvertes semblent toujours plus nombreuses. Cette recherche de robustesse en traitement du signal nous a amené à nous pencher, en détail, sur les problèmes de séparation de sources, spécialement lorsqu’elles sont de nature impulsive à variance infinie et l’estimation des signaux nonstationnaires multi-composantes dans un bruit impulsif. Ces problèmes reviennent à répondre aux questions suivantes : M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 180 Conclusions et Perspectives 1. Quelle sont les méthodes existantes, basées sur l’hypothèse d’existence des statistiques d’ordre deux et d’ordre supérieur, qui peuvent toujours fonctionner du point de vue pratique dans le cas des distributions α-stables ? 2. Comment peut-on justifier cela du point de vue mathématique une fois prouvé par des simulations ? 3. Comment peut-on adapter et rendre robuste les méthodes qui ne s’appliquent plus dans le cas de sources impulsives α-stables ? 4. Comment explorer l’utilisation des moments fractionnaires d’ordre inférieur pour séparer ce genre de sources et généraliser leur utilisation au cas de sources de nature inconnue (impulsive ou non) ? 5. Comment peut-on réduire l’effet du bruit impulsif sur la représentation temps-fréquence des signaux non-stationnaires ? 6. Est-il possible de définir des distribution temps-fréquence robustes à l’effet d’un bruit impulsif ? 7. Peut-on se contenter des méthodes paramétriques d’estimation des signaux non-stationnaires multi-composantes et les rendre robustes ? Les méthodes développées dans cette thèse, et plus particulièrement l’emploi des moments fractionnaires d’ordre inférieur, semblent fournir une piste intéressante et prometteuse. Précisons, de plus, que les développements qui ont été conduits, ne l’ont été qu’à l’aide des propriétés statistiques des lois de probabilités α-stables. Nous les avons présentées en détail ainsi que d’autres propriétés qui n’ont pas été employées mais qui pourraient se révéler d’une grande utilité, dans un chapitre dédié uniquement aux lois stables et à leurs utilisation en traitement du signal. [A]- Séparation de sources impulsives Nous avons essayé de faire une étude complète de la classe des méthodes basées sur les FLOS, en faisant le tour de plusieurs aspects habituellement abordés dans la séparation de sources classiques, incluant les problèmes de blanchiment, de séparation et d’optimisation d’une fonction de contraste ; ne laissant de coté, par manque de temps, que les aspects d’analyse asymptotique de performances. Proposons, tout d’abord, un critère de séparation basé sur la minimisation de la somme des dispersions des observations. Nous avons montré que ce critère de dispersion minimum (MD) est une fonction de contraste qui permettra de séparer les sources de distributions α-stables et nous avons réalisé une implémentation de type Jacobi de l’algorithme proposé. Plus précisḿment, afin d’optimiser cette fonction de coût exprimée en fonction de la matrice de séparation B sous contrainte d’othogonalité, nous avons décomposé cette matrice en produit de matrices de Givens-Jacobi en ramenant le problème d’optimisation matricielle à celui d’optimisation d’une fonction à variable réelle θ (l’angle de rotation). Afin d’évaluer les performances de la méthode MD nous avons défini un indice de mesure de performance comme généralisation du rapport signal/interférencs utilisé habituellement dans la séparation des signaux à variance finie. Nous avons ainsi M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 13.1 Conclusion Générale 181 conduit une série d’expériences de simulations pour comparer la méthode proposée avec les méthodes classiques JADE, EASI et une méthode de type quasi-maximum de vraisemblance proposée spécialement pour les sources α-stables, nommée RQLM [Shereshevski et al.(2001)]. La méthode MD réalise les meilleures performances dans tous les cas de figures considérés (avec bruit, sans bruit, petite et grande taille d’échantillon,... ). Précisons, à ce propos, que la méthode MD présente une robustesse surprenante contre les erreurs d’estimation de l’exposant caractéristique α des distribution α-stable. Ce comportement se retrouve également dans l’approche suivante de Lp -norme, et s’explique par le fait que la modification de la puissance α dans l’expression de la dispersion par une autre valeur α1 (dans le même interval (0, 1] ou [1, 2) que α) définie aussi une fonction de contraste MD et donc permet de séparer correctement les sources [Sahmoudi et al.(2005)]. Exploitant la relation de proportionalité entre la dispersion d’une v.a.r. α-stable et son p-ème moment, nous avons étendu et généralisé le critère de dispersion minimum pour séparer des mélanges linéaires de sources à distributions inconnues (α-stable ou non). Cette approche rejoint naturellement la représentation parcimonieuse des sources par la norme Lp utilisées dans la littérature. Sous contrainte d’orthogonalité, nous avons montré que le critère qui consiste à minimiser la somme des normes Lp des observations est une fonction de contraste qui sépare correctement les mélanges linéaires [Sahmoudi(2005)]. Toujours attaché à l’examen des méthodes existantes de séparation de sources, nous avons constaté que l’on peut classer les approches existantes en deux classes de méthodes : une classe de méthodes basées sur la structure algébrique tensoriel du mélange (comme l’algorithme JADE par exemple) toujours valable pour la séparation des sources à queue lourde, en particulier celles de distributions α-stables et une deuxième classe des méthodes basées sur différents critères de mesure d’indépendance incapable de séparer les sources considérées dans ce travail. La question que nous avons posé logiquement après cette classification est la suivante : Comment on peut justifier, de point de vue mathématique, l’utilisation des algorithmes de structure algébrique robuste vu qu’ils sont basés sur des statistiques infinies ”en principe” et comment peut-on rendre robuste ceux qui ne le sont pas ?. Pour répondre à cette question, nous avons proposé une normalisation approprié des statistiques d’ordre deux (la covariance) et des cumulants d’ordre quatres. Cette normalisation a fait converger asymptotiquement ces nouvelles statistiques normalisées vers des tenseurs de structures désirées pour la séparation de sources. Ces statistiques construites pour valider les approches classiques algébrique de séparation de sources [Sahmoudi et al.(2004a)], s’appuient sur la proprièté de queue lourde des lois stables. Elles nous ont permis aussi d’introduire des normalisations convenables dans les critères de séparation de sources basés sur la décorrélation non-linéaire. Ces nouvelles fonctions de contrastes sont alors devenues robustes et valables dans le cas des signaux à queue lourde [Sahmoudi et Abed-Meraim(2004b)]. Néanmoins, si les composantes indépendantes n’ont pas cette dernière caractérisation, la normalisation ne joue qu’un rôle de multiplication par une constante et qui ne peut être que bénifique pour la convergence de certains algorithmes comme le cas de l’algorithme EASI par exemple. Ces statistiques normalisées définissent en fait une classe entière de techniques qui est loin d’être exploitée entièrement dans cette M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 182 Conclusions et Perspectives thèse. Une autre approche fondamentale en estimation statistique et qui a attirée notre attention pour l’estimation des composantes indépendantes d’un mélange linéaire est celle du principe du maximum de vraisemblance (MV) qui ouvre comme toujours la possibilité de plusieurs développements. En effet, nous avons proposé une structure semi-paramétrique de l’approche MV qui consiste à combiner une version stochastique de l’algorithme EM et une technique d’approximation des densités des sources par les fonctions log-splines [M. Sahmoudi et al.(2005)]. Les avantages de cette méthode sont appréciables dans le sens du non besoin d’un modèle des densités des sources et dans l’optique d’une robustesse aux erreurs éventuelles de modélisation des sources puisque nous approchons et nous estimons leurs densités directement à partir des observations. [B]- Traitement des signaux non-stationnaires multicomposantes Dans la deuxième partie de cette thèse, nous avons abordé certains aspects de l’analyse des signaux FM non-stationnaires et nous avons considéré principalement, le cas d’un signal multi-composantes affecté par un bruit impulsif. Nous avons traité le problème d’estimation de la fréquence instantanée. Pour cela, nous avons proposé d’utiliser la méthode dite de M-estimation de Huber robuste à l’effet des valeurs abérrantes (ou bruit impulsif) dans les données et qui a comme objectif de fournir des estimateurs dont les performances ne sont pas trop détériorées en présence d’un bruit non-gaussien à queue lourde. Cet environnement nous a conduit à modéliser le bruit par une distribution αstable en conjonction avec l’approche de M-estimation dans une procédure paramétrique dans un premier temps et puis dans une procédure d’analyse tempsfréquence dans un second temps. – Le premier objectif fut la recherche de nouvelles approches paramétriques robustes dans ce cas particulier de bruit non-gaussien. Nous commençons par ramener le problème à celui de l’estimation de signaux harmoniques noyés dans un bruit impulsif grâce à une transformée polynomiale du signal. La méthode haute résolution MUSIC est alors appliquée au signal ainsi transformé pour l’estimation des paramètres. Trois cas de figures sont considérés en dérivant trois algorithmes : (i) Celui de l’application directe de l’algorithme MUSIC au signal harmonique tronqué. L’algorithme correspondant est baptisé TRUNC-MUSIC ; (ii) celui de l’application de l’algorithme MUSIC à l’estimée robuste de la fonction de covariance du signal harmonique. L’algorithme correspondant est baptisé ROCOV-MUSIC et (iii) celui de l’application de MUSIC à la covariation généralisée du signal. L’algorithme correspondant est baptisé FLOS-MUSIC car il est basé sur les statistiques fractionnaires d’ordre inférieur FLOS. Les résultats de comparaison ont montré une certaine supériorité en faveur M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 13.2 Perspectives 183 de l’algorithme ROCOV-MUSIC. – Le second objectif de cette partie fut l’étude de l’influence d’un bruit additif impulsif sur les méthodes d’estimation non-paramétriques temps-fréquence. • Procedure de pré-traitement du bruit impulsif : Dans une première approche, nous avons appliqué la procédure de robustesse au sens minimax d’Huber contre l’effet du bruit impulsif sous forme d’une étape de prétraitement par deux techniques différentes à savoir : 1. La technique de compression des amplitudes par un filtre non-linéaire de type |x|β ; 0 < β < 1 et, 2. La technique de troncature du signal en amplitude. Par la suite nous représentons le signal dans le plan temps-fréquence en utilisant des transformées quadratiques adéquates au cas multicomposantes et un algorithme d’extraction des composantes afin d’estimer leurs fréquences instantanées. • Procedure basée sur une distribution temps-fréquence robuste : Par contre dans la deuxième approche, nous avons combiné l’approche de robustesse M-estimation avec les transformées temps-fréquence quadratiques pour définir une classe de transformées robustes à l’effet du bruit impulsif et des termes croisés d’un signal multicomposantes. Une étude comparative par simulation montre l’avantage des méthodes tempsfréquence comparées aux méthodes paramétriques précédentes pour l’estimation de signaux FM multi-composantes en présence de bruit impulsif. On peut souligner aussi que les représentations robustes temps-fréquence peuvent servir à d’autres applications qui sortent du cadre d’estimation des signaux FM traités dans ce travail. 13.2 Perspectives De nombreuses questions restent ouvertes : [A]- Séparation de sources impulsives • Comment améliorer les méthodes proposées en exploitant les techniques d’analyse de performances existantes dans la littérature ? • Une implémentation de type gradient est tout-à-fait possible pour optimiser le critère de dispersion minimum proposé pour la séparation des sources α-stables. • Comment convergent les algorithmes proposés ? • Comment choisir les non-linéarités par rapport aux distributions des sources α-stables dans l’approche basée sur la décorrelation non-linéaire ? • Comment étendre l’étude réalisée ici dans le cas des mélanges linéaires au cas des mélanges non-linéaires ou convolutifs ? M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 184 Conclusions et Perspectives • Un des problèmes, qui n’a pas été traité dans cette thèse, est celui du test de la variance. En effet, ce test permettrait avant d’appliquer n’importe quel algorithme de savoir si on est en présence d’un mélange à queue lourde de variance infinie ou non. Sur cette question, il est à noter que certains travaux ont déjà été développés dans la littérature de probabilités et statistiques et qui peuvent être exploités dans notre contexte de séparation de sources. • Comment peut-on combiner les trois classes de statistiques de second ordre, celle d’ordre supérieur et celles d’ordre inférieur pour définir des critères généraux de séparation de sources ?. Sur ce point on envisage de rajouter aux critères de séparation basés sur la décorrelation non-linéaire un terme d’ordre inférieur pour réduire l’effet d’”impulsivité” éventuelle des sources. • Généralisation des approches basées sur l’utilisation des statistiques normalisées au cas des sources α-stables avec différents exponents caractéristiques α. Pour ce faire, nous envisageons d’utiliser la procédure de déflation. • Exploiter aussi les moments statistique logarithmiques définies à partir de la fonction caractéristique de deuxième espèce. • Creuser ce problème dans le cas sous determiné ; plus de sources que de capteurs en exploitant la nature parcimonieuse des sources impulsives. • Pour traiter cette dernière question, nous nous intéressons à la séparation de sources dans le domaine transformée en ondelettes. Les aspects qui nous intéressent en particulier sont : Le caractère impulsif des coefficients d’ondelettes et le caractère parcimonieux de la représentation en ondelettes. Cette propriété de parcimonie a été récemment utilisée pour pouvoir séparer un nombre de sources supérieur à celui des capteurs. [B]- Traitement des signaux non-stationnaires multicomposantes • L’analyse théorique des performances des approches proposées. • La validation des méthodes proposées à travers leur application sur des signaux réels de type radar, sonar ou biomédicale. • L’étude théorique de l’influence de la présence d’un bruit multiplicatif aléatoire. • Exploration des méthodes de tests statistiques dans le plan temps-fréquence pour l’extraction des composantes d’un signal non-stationnaires FM. • Implémentation des algorithmes temps-fréquence en temps réel pour résoudre des problèmes pratiques en communication. Cela consiste à développer des nouvelles méthodes de gestion des services sumultaée en temps et en fréquence dans les réseuax de communication. • Approfondir l’analyse des méthodes existantes de classification dans le plan temps-fréquence car cela peut résoudre plusieurs problèmes d’estimation et de détection des signaux non-stationnaires. • Exploiter au mieux la distribution de probabilité de la fréquence instantannée pour améliorer l’analyse temp-fréquence. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 13.2 Perspectives 185 In the end is my beginning ! T.S. Eliot ♦ merci, chokran, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you, chokran, merci, thank you chokran, merci, thank you chokran, merci, thank you, chokran, merci, thank you ♦ M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 186 Conclusions et Perspectives M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 187 Bibliographie [Abed-Meraim et Hua(1997)] Abed-Meraim, K. et Hua, Y. (1997). Joint schur decomposition : Algorithms and applications. In Proceeding of First International Conference on Information, Communications and Signal Processing , (supplement proceedings ; ICICS’97), Singapore. [Abed-Meraim et al.(1996)] Abed-Meraim, K., Bellouchrani, A., et Hua, Y. (1996). Blind identification of a linear-quadratic mixture of independent component based on joint diagonalization procedure. In Proc. of ICASSP’1996, Atlanta, USA. [Abed-Meraim et al.(1997a)] Abed-Meraim, K., Qiu, W., et Hua, Y. (1997a). Blind system identification. Proceedings of the IEEE, 85(8), 1310–1322. [Abed-Meraim et al.(1997b)] Abed-Meraim, K., Loubaton, P., et Moulin, E. (1997b). A subspace algorithm for certain blind identification problems. IEEE Trans. on Information Theory, 43(2), 499–511. [Abed-Meraim et al.(2000)] Abed-Meraim, K., Hua, Y., , et Ikram, M. Z. (2000). A fast algorithm for conditional maximum likelihood blind identification of SIMO/MIMO FIR systems. In Proc. EUSIPCO (invited paper). [Abed-Meraim et al.(2001)] Abed-Meraim, K., Xiang, Y., Manton, J., , et Hua, Y. (2001). Blind source separation using second order cyclostationary statistics. IEEE Transaction on Signal Processing, 49(4), 694–701. [Abed-Meraim et al.(2003)] Abed-Meraim, K., Nguyen, L., Sucis, V., Tupin, F., et Boashash, B. (2003). An image processing approach for underdetermined blind separation of nonstationary sources. In Proceeding of Int. Symp. on Sig. and Image Proc. and Analysis, Rome. [Adib et al.(2002)] Adib, A., Moreau, E., et Aboutajdine, D. (2002). A combined contrast and reference signal based blind source separation by a deflation approach. In Proceedings of the 2nd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT’2002), Marrakesh, Marocco. [Adjrad et al.(2003)] Adjrad, M., Belouchrani, A., et Abed-Meraim, K. (2003). Parameter estimation of multicomponent polynomial phase signals impinging on a multi-sensor array using extended kalman filter. In Proceeding of (ISSPIT’2003), Darmstadt, Germany. [Adler et al.(1998)] Adler, R., Feldman, R. E., et Taqqu, M. (1998). A Practical Guide to Heavy Tails : Statistical Techniques and Applications. Birkhauser, Boston. [Akay et Erözden(2004)] Akay, O. et Erözden, E. (2004). Use of fractional autocorrelation in efficient detection of pulse compression radar signals. In IEEE First International Symposium on Control, Communications and Signal Processing, pages 33 – 36. [Akgiray et Lamoureux(1989)] Akgiray, V. et Lamoureux, C. (1989). Estimation of stable-law parameters : a comparative study. Journal of Business & Economic Statistics, 7, 85–93. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 188 BIBLIOGRAPHIE [Altes(1980)] Altes, R. A. (1980). Detection, estimation, and classification with spectrograms. The Journal of the Acoustical Society of America, 67(4), 1232–1246. [Altinkaya et al.(2002)] Altinkaya, M. A., Delic, H., Sankur, B., et Anarim, E. (2002). Subspace-based frequency estimation of sinusoidal signals in alpha-stable noise. Signal Processing, 82, 1807–1827. [Amari(1998)] Amari, S.-I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10(2), 251–276. [Amari et Cardoso(1997)] Amari, S.-I. et Cardoso, J.-F. (1997). Blind source separation—semiparametric statistical approach. IEEE Trans. on Signal Processing, 45(11), 2692–2700. [Amari et al.(1996)] Amari, S.-I., Cichocki, A., et Yang, H. (1996). A new learning algorithm for blind source separation. In Advances in Neural Information Processing Systems 8, pages 757–763. MIT Press. [Ambike et Hatzinakos(1995)] Ambike, S. et Hatzinakos, D. (1995). A new filter for highly impulsive α-stable noise. In IEEE Workshop on Nolinear Signal and Image Processing, Halkidiki, Greece. [Amin(1992)] Amin, M. (1992). Time-Frequency Signal Analysis : Methods and Applications. Longman-Chesire. [Amin(1997)] Amin, M. G. (1997). Interference mitigation in spread spectrum communication systems using time-frequency distributions. IEEE Transactions on Signal Processing, 45(1), 90–101. [Amin et Zhang(2000)] Amin, M. G. et Zhang, Y. (2000). Effects of cross-terms on the performance of time–frequency MUSIC. In Proceedings of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop, pages 479–483. [Amin et al.(1999)] Amin, M. G., Wang, C., et Lindsey, A. R. (1999). Optimum interference excision in spread spectrum communications using open-loop adaptive filters. IEEE Transactions on Signal Processing, 47(7), 1966–1976. [Amin et al.(2000)] Amin, M. G., Belouchrani, A., et Zhang, Y. (2000). The spatial ambiguity function and its applications. IEEE Signal Processing Letters, 7(6), 138–140. [Andrews(1974)] Andrews, D. F. (1974). Scale mixtures of normal distributions. Journal Royal Statistical Society, B 36, 99–102. [Babaie-Zadeh et al.(2004)] Babaie-Zadeh, M., Mansour, A., Jutten, C., et Marvasti, F. (2004). A geometric approach for separating several signals. In Fifth International Symposium on Independent Component Analysis and Blind Signal Separation, pages 798–806, Granada, Spain. [Babie-Zadeh(2002)] Babie-Zadeh, M. (2002). On Blind Source Separation in Convolutive and Nonlinear Mixtures. Ph.D. thesis, INPG, Grenoble. [Barbarossa(1995)] Barbarossa, S. (1995). Analysis of multicomponent LFM signals by a combined Wigner-Hough transform. IEEE Transactions on Signal Processing, 43, 1511–1515. [Barbarossa et Petrone(1997)] Barbarossa, S. et Petrone (1997). Analysis of polynomial phase signals by an integrated generalized ambiguity function. IEEE Transaction on Signal Processing, 45(2), 316–327. [Barbarossa et Scaglione(1999a)] Barbarossa, S. et Scaglione, A. (1999a). Adaptive time-varying cancellation of wideband interferences in spread-spectrum communications based on time-frequency distributions. IEEE Transactions on Signal Processing, 47(4), 957–965. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 189 [Barbarossa et Scaglione(1999b)] Barbarossa, S. et Scaglione, A. (1999b). Optimal precoding for transmissions over linear time-varying channels. In Seamless Interconnection for Universal Services. GLOBECOM’99, volume 5, pages 2545–2549, Piscataway, NJ. [Barbarossa et Scaglione(2000)] Barbarossa, S. et Scaglione, A. (2000). Theoretical bounds on the estimation and prediction of multipath time-varying channels. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’2000, volume 5, pages 2545–2548, Istanbul, Turkey. [Barbarossa et al.(1997)] Barbarossa, S., Scaglione, A., Spalletta, S., et Votini, S. (1997). Adaptive suppression of wideband interferences in spread-spectrum communications using the Wigner-Hough transform. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’97, volume 5, pages 3861–3864, California. [Barkat(2000)] Barkat, B. (2000). Design, estimation, and performance of time–frequency distributions. Ph.D. thesis, Queensland University of Technology, Brisbane, Australia. [Barkat(2001)] Barkat, B. (2001). Instantaneous frequency estimation of nonlinear frequency–modulated signals in the presence of multiplicative and additive noise. IEEE Transactions on Signal Processing, 49(10), 2214–2222. [Barkat et Abed-Meraim(2003a)] Barkat, B. et Abed-Meraim, K. (2003a). Detection of known FM signals in known heavy-tailed noise. In Proceeding of ISSPIT’2003, Darmstadt, Germany. [Barkat et Abed-Meraim(2003b)] Barkat, B. et Abed-Meraim, K. (2003b). An effective technique for the IF estimation of FM signals in heavy-tailed noise. In Proceeding of ISSPIT’2003, Germany. [Barkat et Abed-Meraim(2004)] Barkat, B. et Abed-Meraim, K. (2004). Algorithms for blind components separation and extraction from the time-frequency distribution of their mixture. to appear in Journal of App. Sig. Proc. [Barkat et Boashash(2001)] Barkat, B. et Boashash, B. (Oct. 2001). A high-resolution quadratic time-frequency distribution for multicomponent signals analysis. IEEE Transactions on Signal Processing, 49. [Barkat et Stankovic(2004)] Barkat, B. et Stankovic, L. (2004). Analysis of polynomial FM signals corrupted by heavy-tailed noise. Signal Processing, 84, 69–75. [Barndorff(1998)] Barndorff, O. E. (1998). Processes of normal inverse gaussian type. Finance and Stochastics, 2, 41–68. [Barndorff-Nielsen(1997)] Barndorff-Nielsen, O. E. (1997). Normal inverse gaussian distribution and stochastic volatility modeeling. Scandinavian Journal of Statistics, 24, 1–13. [Barros(2000)] Barros, A. K. (2000). The independence assumption : Dependent component analysis. In M. Girolami, editor, Advances in Independent Component Analysis, pages 63–71. Springer-Verlag. [Bassi et al.(1998)] Bassi, F., Embrechts, P., et Kafetzaki, M. (1998). Risk management and quantile estimation. In R. E. F. R. Adler et M. Taqqu, editors, A practical guide to heavy tails, pages 111–130. Birkhauser, Boston. [Bell et Sejnowski(1995)] Bell, A. et Sejnowski, T. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7, 1129–1159. [Bell(2000)] Bell, A. J. (2000). Information theory, independent component analysis, and applications. In S. Haykin, editor, Unsupervised Adaptive Filtering, Vol. I, pages 237–264. Wiley. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 190 BIBLIOGRAPHIE [Belouchrani(2001)] Belouchrani, A. (2001). Blind source separation : Concepts, approaches and applications. In ISSPA’2001 Tutorial, Kuala–Lumpur, Malaysia. [Belouchrani et Amin(2000)] Belouchrani, A. et Amin, M. (2000). Jammer mitigation in spread spectrum communications using blind source separation. Signal Processing, 80, 724–729. [Belouchrani et Amin(1996)] Belouchrani, A. et Amin, M. G. (1996). A new approach for blind source separation using time-frequency distributions. In Proceedings SPIE conference on Advanced algorithms and Architectures for Signal Processing, Denver, Colorado. [Belouchrani et Amin(1997)] Belouchrani, A. et Amin, M. G. (1997). Blind source separation using time–frequency distributions : Algorithm and asymptotic performance. In IEEE Proc. ICASSP’97, pages 3469–3472, Germany. [Belouchrani et Amin(1998)] Belouchrani, A. et Amin, M. G. (1998). Blind source separation based on time-frequency signal representations. IEEE Transactions on Signal Processing, 46(11), 2888–2897. [Belouchrani et Amin(1999a)] Belouchrani, A. et Amin, M. G. (1999a). Time–frequency : MUSIC. IEEE Signal Processing Letters, 6(5), 109–110. [Belouchrani et Amin(1999b)] Belouchrani, A. et Amin, M. G. (1999b). A two–sensor array beamformer for direct sequence spread spectrum communications. IEEE Transactions on Signal Processing, 47(8), 2191–2199. [Belouchrani et Cardoso(1994)] Belouchrani, A. et Cardoso, J.-F. (1994). Maximum likelihood source separation for discrete sources. In Proceedings EUSIPCO. [Belouchrani et cardoso(1995)] Belouchrani, A. et cardoso, J.-F. (1995). Maximum likelihood source separation by the expectation-maximization technique : deterministic and stochastic implementation. In In Proceeding of NOLTA, pages 49–53. [Belouchrani et al.(1997a)] Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F., et Moulines, E. (1997a). A blind source separation technique using second order statistics. IEEE Trans. on Sig. Proc., pages 434–444. [Belouchrani et al.(1997b)] Belouchrani, A., Abed-Meraim, K., et Cardoso, J.-F. (1997b). An iterative blind source separation technique : Implementation and performance. In Proceeding of International Conference on Information, Communication and Signal Processing (ICICS’1997), Singapore. [Belouchrani et al.(2001)] Belouchrani, A., Abed-Meraim, K., Amin, M. G., et Zoubir, A. M. (2001). Joint anti-diagonalization for blind source separation. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’2001, Salt Lake city, Utah. [Benidir(1994)] Benidir, M. (1994). Higher-Order Statistical Signal Processing, chapter Theoretical foundations of higher-order statistical signal processing and polyspectra. Longman Cheshire, Australia. [Benidir(1997)] Benidir, M. (1997). Caracterization of polynomial functions and application to time-frequency analysis. IEEE Trans. On Signal Processing, 45(5), 351–1354. [Benidir(2002)] Benidir, M. (2002). Traitement du Signal, Tome 1. Dunod. [Benidir(2003)] Benidir, M. (2003). Traitement du Signal, Tome 2. Dunod. [Benidir et al.(2002)] Benidir, M., Ouldali, A., et Sahmoudi, M. (2002). Performances analysis for the haf-estimator for a time-varying amplitude phase-modulated signals. In The international IASTED Conference on Control and Applications (CA’2002), Cancun, Mexico. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 191 [Bergstrom(1952)] Bergstrom, H. (1952). On some expansions of stable distribution functions. Arkiv Mathematik, 2, 375–378. [Berlekamp(1968)] Berlekamp, E. R. (1968). Algebraic Coding Theory. McGraw-Hill, New York. [Bermond(2000)] Bermond, O. (2000). Statistical Methods for Blind Source Separation (Méthodes statistiques pour la séparation de sources). Ph.D. thesis, ENST, Paris, France. [Besson et Castanié(1993)] Besson, O. et Castanié, F. (1993). On estimating the frequency of a sinusoid in autoregressive multiplicative noise. Signal Processing, 30(1), 65–83. [Besson et al.(1999)] Besson, O., Ghogho, N., et Swami, A. (1999). Parameter estimation for random amplitude chirp signals. IEEE Transactions on Signal Processing, 47(12), 3208–3219. [Besson et al.(2000a)] Besson, O., Vincent, F., Stoica, P., et Gershman, A. B. (2000a). Approximate maximum likelihood estimators for array processing in multiplicative noise environments. IEEE Transactions on Signal Processing, 48(9), 2506–2518. [Besson et al.(2000b)] Besson, O., Gini, F., Griffiths, H. D., et Lombardini, F. (2000b). Estimating ocean surface velocity and coherence time using multichannel ATI-SAR systems. Proceedings of the IEE : F, 147(6), 299–308. [Bestravos et al.(1998)] Bestravos, A., Crovella, M., et Taqqu, M. (1998). Heavy-tailed distributions in the world wide web. In R. E. F. R. Adler et M. Taqqu, editors, A practical guide to heavy tails, pages 3–25. Birkhauser, Boston. [Bhashyam et al.(2000)] Bhashyam, S., Sayeed, A. M., et Aazhang, B. (2000). Time-selective signaling and reception for communication over multipath fading channels. IEEE Transactions on Communications, 48(1), 83–94. [Bircan et al.(1998)] Bircan, A., Tekinay, S., et Akansu, A. N. (1998). Time-frequency and time-scale representation of wireless communication channels. In Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pages 373–376, Pittsburgh, Pennsylvania, USA. [Blachman(1965)] Blachman, N. M. (1965). The convolution inequality for entropy powers. IEEE Transaction on Information Theory, 11, 276–271. [Boashash(1991)] Boashash, B. (1991). Time-frequency signal analysis. In S. Haykin, editor, Advances in Spectrum Analysis and Array Processing, volume I, chapter 9, pages 418–517. Prentice-Hall, Englewood Cliffs, New Jersey. [Boashash(1992a)] Boashash, B. (1992a). Estimating and interpreting the instantaneous frequency of a signal- Part 1 : Fundamentals. Proceedings of the IEEE, 80(4), 519–538. [Boashash(1992b)] Boashash, B. (1992b). Estimating and interpreting the instantaneous frequency of a signal- Part 2 : Algorithms and applications. Proceedings of the IEEE, 80(4), 539–569. [Boashash(1992c)] Boashash, B., editor (1992c). Time-Frequency Signal Analysis : Methods and Applications. Longman Cheshire, Melbourne, Australia. [Boashash(1993)] Boashash, B. (1993). Recent advances in non-stationary signal analysis : time-varying higher order spectra and multilinear time-frequency signal analysis. In Proceedings of the SPIE - The International Society for Optical Engineering, volume 2027, pages 2–26. [Boashash(1996)] Boashash, B. (1996). Time frequency signal analysis : Past, present and future trends. In C. T. Leondes, editor, Control and Dynamic Systems, volume 48, pages 1–69. Academic Press, San Diego. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 192 BIBLIOGRAPHIE [Boashash(2002)] Boashash, B. (2002). Time Frequency Signal Analysis and Processing. Prentice–Hall. [Boashash et Jones(1992)] Boashash, B. et Jones, G. (1992). Instantaneous frequency and time-frequency distributions. In B. Boashash, editor, Time-Frequency Signal Analysis, chapter 2, pages 43–73. Longman Cheshire, Melbourne, Australia. [Boashash et O’Shea(1993)] Boashash, B. et O’Shea, P. (1993). Use of the cross Wigner-Wille distribution for estimation of instantaneous frequency. IEEE Transactions on Signal Processing, 41(3), 1439–1445. [Boashash et O’Shea(1994)] Boashash, B. et O’Shea, P. (1994). Polynomial Wigner-Ville distributions and their relationship to time-varying higher-order spectra. IEEE Transactions on Signal Processing, 42, 216–220. [Boashash et Ristic(1992)] Boashash, B. et Ristic, B. (1992). Robust radar algorithms. Technical report, Signal Processing Research Centre, Queensland University of Technology, Brisbane, Australia. [Boashash et Ristic(1993a)] Boashash, B. et Ristic, B. (1993a). Analysis of FM signals affected by Gaussian AM using reduced Wigner–Ville trispectrum. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’93, volume IV, pages 408–411, Minneapolis. [Boashash et Ristic(1993b)] Boashash, B. et Ristic, B. (1993b). Application of cumulant TVHOS to the analysis of composite FM signals in multiplicative and additive noise. In F. T. Luk, editor, Proceedings of SPIE, Advanced Signal Processing Algorithms, Architectures and Implementations, volume 2027, pages 245–255, San Diego. [Boashash et Ristic(1993c)] Boashash, B. et Ristic, B. (1993c). Polynomial time-frequency distributions and time-varying polyspectra. Technical report, Signal Processing Research Centre, Queensland University of Technology, Brisbane, Australia. [Boashash et Ristic(1995)] Boashash, B. et Ristic, B. (1995). A time-frequency perspective of higher-order spectra as a tool for non-stationary signal analysis. In B. Boashash, E. J. Powers, et A. M. Zoubir, editors, Higher Order Statistical Signal Processing, chapter 4, pages 111–149. Longman, Australia. [Boashash et Ristic(1998)] Boashash, B. et Ristic, B. (1998). Polynomial time-frequency distributions and time-varying higher order spectra : Application to the analysis of multicomponent FM signals and to the treatment of multiplicative noise. Signal Processing, 67, 1–23. [Boashash et Rodriguez(1984)] Boashash, B. et Rodriguez, F. (1984). Recognition of time-varying signals in the time-frequency domin by means of the wigner distribution. In Proc. of ICASSP’1984, San Diego, USA. [Boashash et Sucic(2002)] Boashash, B. et Sucic, V. (2002). High performance time–frequency distributions for practical applications. In L. Debnath, editor, Wavelets and Signal Processing. Birkhauser, Boston, New York : Springer–Verlag. [Boashash et Sucic(2003)] Boashash, B. et Sucic, V. (2003). Resolution measure criteria for the objective assessement of the performance of quadratic time-frequency distributions. IEEE Trans. on Signal Processing, 51(5), 1253–1263. [Boashash et al.(1995)] Boashash, B., Powers, E. J., et Zoubir, A. M., editors (1995). Higher Order Statistical Signal Processing. Longman, Australia. [Bodenschatz et Nikias(1999)] Bodenschatz, J. S. et Nikias, C. L. (1999). Maximum likelihood symmetric α-stable parameter estimation. Trans. on signal Processing, 47(5). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 193 [Boscolo et al.(2004)] Boscolo, R., Pan, H., et Roychowdhury, V. P. (2004). Independent Component Analysis Based on Nonparametric Density Estimation. IEEE Transaction on Neural Networks, 15(1). [Boudreaux-Bartels et Marks(1986)] Boudreaux-Bartels, G. F. et Marks, T. W. (1986). Time-varying filtering and signal estimation using Wigner distributions. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34, 422–430. [Box(1953)] Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika, (40), 318–335. [Brcich et Zoubir(2002)] Brcich, R. F. et Zoubir, A. M. (2002). Robust estimation with parametric score function estimation. In Proceedings of the ICASSP’2002 IEEE Conference, pages 1149–1152. [Cambanis et Miller(1981)] Cambanis, S. et Miller, G. (1981). Linear problems in pth order and stable processes. SIAM J. Appl. Math., 41, 43–49. [Cao et Murata(1999)] Cao, J. et Murata, N. (1999). A Stable and Robust ICA Algorithm Based on T-Distribution and Generalized Gaussian Distribution Model. In Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing IX, pages 283 – 292. [Cappé et al.(2002)] Cappé, O., Moulines, E., Pesquet, J.-C., et A. Petropulu, X. Y. (2002). Long-range dependence and heavy-tail modeling for teletraffic data. IEEE Signal Processing Magazine, pages 14–27. [Cardoso(1989a)] Cardoso, J.-F. (1989a). Blind identification of independent signals. In Proc. Workshop on Higher-Order Specral Analysis, Vail, Colorado. [Cardoso(1989b)] Cardoso, J.-F. (1989b). Source separation using higher order moments. In Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’89), pages 2109–2112, Glasgow, UK. [Cardoso(1991)] Cardoso, J.-F. (1991). Super-symmetric decomposition of the fourth-order cumulant tensor. blind identification of more sources than sensors. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’97), pages 3109–3112. [Cardoso(1998)] Cardoso, J. F. (1998). Blind signal separation : statistical principles. Proc. of the IEEE, 9(10), 2009–2025. [Cardoso(1999)] Cardoso, J.-F. (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1), 157–192. [Cardoso et Comon(1996)] Cardoso, J.-F. et Comon, P. (1996). Independent component analysis, a survey of some algebraic methods. In Proc. ISCAS’96, volume 2, pages 93–96. [Cardoso et Lahed(1996)] Cardoso, J. F. et Lahed, B. (1996). Equivariant adaptive source separation. IEEE Transaction on Signal Procssing, 44, 3017–3030. [Cardoso et Souloumiac(1993)] Cardoso, J. F. et Souloumiac, A. (1993). Blind beamforming for non-gaussian signals. Radar and Signal Processing, IEE Proceedings F. [Castella et Pesquet(2004)] Castella, M. et Pesquet, J. C. (2004). An iterative source separation method for convolutive mixtures of images. In Proceedings of the International Conference on Independent Component Analysis (ICA’2004), pages 922–929. [Castella et al.(2004)] Castella, M., Bianchi, P., Chevreuil, A., et Pesquet, J.-C. (2004). Blind MIMO detection of convolutively mixed CPM sources. In Proceeding of EUSIPCO’2004, Vienna, Austria. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 194 BIBLIOGRAPHIE [Celka et al.(2001)] Celka, P., Boashash, B., et Colditz, P. (2001). Preprocessing and time-frequency analysis of newborn eeg seizures. IEEE Engineering in Medicine & Biology Magazine, 20, 30–39. [Chabert et al.(2003)] Chabert, M., Tourneret, J.-Y., et Coulon, M. (2003). Joint detection of variance changes using hierarchical Bayesian analysis. In Proceeding of the IEEE International workshop on Statistical Signal Processing, Saint-Louis, Missouri, USA. [Chambers et al.(1976)] Chambers, J. M., Mallows, C. L., et Stuck, B. W. (1976). A method for simulating stable random variables. Journal of the American Statistical Association, 71(354), 340–344. [Chen et Bickel(2003)] Chen, A. et Bickel, P. J. (2003). Efficient Independent Component Analysis. Departement of Statistics, University of California, Berkele, Technical report 634. [Chen et Bickel(2004)] Chen, A. et Bickel, P. J. (2004). Robustness of prewhitening against heavy-tailed sources. In Proceeding of the Fifth International Conference on Independent component Analysis and Blind Signal Separation (ICA’2004), Granada, Spain. [Choi et Williams(1989)] Choi, H. et Williams, W. (1989). Improved time–frequency representation of multicomponent signals using exponential kernels. IEEE Transactions on Signal Processing, 37(6), 862–871. [Cichocki et Amari(2002)] Cichocki, A. et Amari, S. (2002). Adaptive Blind Signal and Image Processing. John Wiley & Sons, Singapore. [Cichocki et Unbehauen(1996)] Cichocki, A. et Unbehauen, R. (1996). Robust neural networks with on-line learning for blind identification and blind separation of sources. IEEE Trans. on Circuits and Systems, 43(11), 894–906. [Cichocki et al.(1994)] Cichocki, A., Unbehauen, R., et Rummert, E. (1994). Robust learning algorithm for blind separation of signals. Electronics Letters, 30(17), 1386–1387. [Cichocki et al.(2004)] Cichocki, A., Li, Y., et ans S. I. Amari, P. G. (2004). Beyond ICA : Robust sparse signal representation. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS ’04), volume 5, pages 684 – 687. [Classen et Mecklenbrauker(1980)] Classen, T. et Mecklenbrauker, W. (1980). The wigner distribution –part 1. Phillips Journal of Research, 35, 217–250. [Cline et Brockwell(1985)] Cline, D. B. et Brockwell, P. (1985). Linear prediction of ARMA processes with infinite variance. Stoch. Processes & Applications, 19, 281–296. [Cohen(1966)] Cohen, L. (1966). Generalized phase-space distribution functions. Journal of Mathematical Physics, 7(5), 781–786. [Cohen(1992)] Cohen, L. (1992). What is a Multicomponent Signal. [Cohen(1995)] Cohen, L. (1995). Time-frequency Analysis. Prentice-Hall. [Comon(1989)] Comon, P. (1989). Separation of stochastic processes. In Proc. Workshop on Higher-Order Specral Analysis, pages 174 – 179, Vail, Colorado. [Comon(1994)] Comon, P. (1994). Independent component analysis, a new concept. Signal Processing, 36, 287–314. [Cook et Bernfeld(1993)] Cook, C. E. et Bernfeld, M. (1993). Radar Signals : An Introduction to Theory and Application. Artech House, Norwood, MA. [Cook et al.(1993)] Cook, D., Buja, A., et Cabrera, J. (1993). Projection pursuit indexes based on orthonormal function expansions. J. of Computational and Graphical Statistics, 2(3), 225–250. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 195 [Coulon et Tourneret(1999)] Coulon, M. et Tourneret, J. (1999). Multiple frequency estimation in additive and multiplicative colored noises. In Proceeding of ICASSP’1999, pages 1573–1576, Phoenix, USA. [Cover et Thomas(1991)] Cover, T. M. et Thomas, J. A. (1991). Elements of Information Theory. Wiely Series in Telecommunications. [Crespo et al.(1995)] Crespo, P. M., Honig, M. L., et Salehi, J. A. (1995). Spread-time code-division multiple access. IEEE Transactions on Communications, 43(6), 2139–2147. [Davy et al.(2002)] Davy, M., Doncarli, C., et Tourneret, J.-Y. (2002). Classification of chirp signals using hierarchical Bayesian learning and MCMC methods. IEEE Trans. on Signal Proc., 50(2), 377 – 388. [de Boor(1978)] de Boor, C. (1978). A practical guide to splines. Springer-Verlag, New York, applied mathematical sciences edition. [Delmas(2004)] Delmas, J. (2004). Asymptotically optimal estimation of DOA for non-circular sources from second-order moments. IEEE Trans. on Signal Processing, pages 1235–1245. [Delmas(1997)] Delmas, J. P. (1997). An extension to the EM algorithm for exponential family. IEEE Trans. on Signal Processing, 4(10), 2613–2615. [Delmas et al.(2000)] Delmas, J. P., Gazzah, H., Liavas, A. P., et Regalia, P. A. (2000). Statistical analysis of some seconde order methods for blind channel identification equalization with respect to channel undemodeling. IEEE Trans. on Signal Processing, 48(7), 1984–1998. [Delyon et al.(1999)] Delyon, B., lavielle, M., et Moulin, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist., 27(1), 94–128. [Dempster(1977)] Dempster, E. J. (1977). Maximum likelihood from incomplete data via EM algorithm. Annals Royal Statistical Society, 39, 1–38. [d’Estamps(2003)] d’Estamps, L. (Oct. 2003). Traitement Statistique des Processus Alpha-Stables : mesure de dépendance et identification des AR Stables. Ph.D. thesis, Institut National Polytechnique de Toulouse, Toulouse, France. [Diebolt et Celeux(1993)] Diebolt, J. et Celeux, G. (1993). Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Comm. Statist. Stochastic Models, 9(4), 599–613. [Djafari(1999)] Djafari, A. M. (1999). A Bayesian approach to source separation. In AIP Conference Proceedings 567, Maximum Entropy and Bayesian Methods, pages 221–244, Boise, Idaho, USA. [Djeddi et Benidir(2004)] Djeddi, M. et Benidir, M. (2004). Robust Polynomial Wigner-Ville Distribution For The Analysis of Polynomial Phase Signals in α-Stable Noise. In Proceedings of the IEEE Conference ICASSP’2004. [Djuric et Kay(1990)] Djuric, P. C. et Kay, S. M. (1990). Parameter estimation of chirp signals. IEEE Trans. Acoust., Speech, Signal Processing, 38(12), 2118–2126. [DuMouchel(1973)] DuMouchel, W. H. (1973). On the asymptotic normality of the maximum likelihood estimate one sampling from a stable distribution. Annals of statistics, 1, 948–957. [E. Moreau(1997)] E. Moreau, J.-C. P. (1997). Independence/decorrelation measures with applications to optimized orthonormal representations. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’97), volume 5. [El-Hassouni et Cherifi(2003)] El-Hassouni, M. et Cherifi, H. (2003). A 2-d Adaptive Least lp -Norm Filter For Impulsive Noise Cancellation in Still Images. In Proceeding of ISPA’2003, Paris, France. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 196 BIBLIOGRAPHIE [Elliott(1938)] Elliott, R. (1938). The wave principle. Collins, New York. [Erdogmus et al.(2002)] Erdogmus, D., Rao, Y. N., Principe, J. C., Zaohao, J., et Hild-II, K. E. (2002). Simultaneous extraction of principal components using givens rotations and output variances. In ICASSP’2002, pages 1069–1072. [Eriksson et Koivanen(2003)] Eriksson, J. et Koivanen, V. (2003). Characteristic-function based independent component analysis. Signal Processing, 83, 2195–2208. [et al(2004)] et al, R. B. (2004). Independent component analysis based on nonparametric density estimation. IEEE Trans. on Neural Networks, 15(1). [Even(2003)] Even, J. (Déc. 2003). Contributions a la Separation de Sources à l’aide de Statistiques d’Ordre. Ph.D. thesis, Université Joseph Fourier Gronoble, Gronoble, France. [Fama(1965)] Fama, E. F. (1965). The behavior of stock market price. Journal of Business, 38, 34–195. [Fama et Roll(1968)] Fama, E. F. et Roll, R. (1968). Some properties of symmetric stable distributions. Journal of the American Statistical Association, 63, 817–836. [Fama et Roll(1971)] Fama, E. F. et Roll, R. (1971). Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association, 66, 817–836. [FastICA(1998)] FastICA (1998). The FastICA package for MATLAB. Available at http ://www.cis.hut.fi/projects/ica/fastica/. [Feller(1966)] Feller, W. (1966). An Introduction to Probability Theory and its Applications, volume 1. John Wiley. [Feller(1971)] Feller, W. (1971). An introduction to probability theory and its applications, Vol. II. John Wiley&Sons, 2nd edition. [Fevotte et Doncarli(2004)] Fevotte, C. et Doncarli, C. (2004). Two contributions to blind source separation using time-frequency distributions. IEEE Signal Processing Letters, 11. [Flandrin(1988a)] Flandrin, P. (1988a). A time-frequency formulation of optimum detection. International Conference on Acoustics, Speech, and Signal Processing, ICASSP’98, 36(9), 1377–1384. [Flandrin(1988b)] Flandrin, P. (1988b). A time-frequency formulation of optimum detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(9), 1377–1384. [Flandrin(1993)] Flandrin, P. (1993). Temps-fréquence. Hermes, Paris. [Flandrin(1998)] Flandrin, P. (1998). Time-Frequency / Time-Scale Analysis, Volume 10. Academic Press. [Fonollosa et Nikias(1994)] Fonollosa, J. R. et Nikias, C. L. (1994). Analysis of finite-energy signals using higher-order moments-and spectra-based time-frequency distributions. Signal Processing, 36, 315–328. [Francos et Porat(1999)] Francos, A. et Porat, M. (1999). Analysis and synthesis of multicomponent signals using positive time-frequency distributions. IEEE Transactions on Signal Processing, 47(2), 493–504. [Francos et Friedlander(1995)] Francos, J. et Friedlander, B. (1995). Bounds for estimation of multicomponent signals with random amplitude and deterministic phase. IEEE Transactions on Signal Processing, 43(5), 1161–1172. [Frechet(1924)] Frechet, M. (1924). Sur la loi des erreurs d’observation. Mtematicheskii Sbornik, (32), 1–8. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 197 [Freedman et Diaconis(1982)] Freedman, D. A. et Diaconis, P. (1982). On inconsistent M-estimators. The Annals of Statistics, 10(2), 454–461. [Friedlander et Francos(1995)] Friedlander, B. et Francos, J. (1995). Estimation of amplitude and phase parameters of multicomponent signals. IEEE Transactions on Signal Processing, 43(4), 917–926. [Friedmann et al.(2000)] Friedmann, J., Messer, H., et Cardoso, J.-F. (April 2000). Robust parameter estimation of a deterministic signal in impulsive noise. IEEE, Tr. On Sig. Proc., 48(4). [Gaarder(1968)] Gaarder, N. T. (1968). Scattering function estimation. [Gallagher(2000)] Gallagher, C. M. (2000). Estimating the autocovariation from stationary heavy-tailed data, with applications to time series. Rapport technique, Clemson University. [Gallagher(2001)] Gallagher, C. M. (2001). A method for fitting stable autoregressive models using the autocovariation function. Statistics & Probability Letters, 53(4), 381–390. [Gallagher(2002)] Gallagher, C. M. (2002). Testing for linear dependence in heavy-tailed data. Communication in Statistics, Theory and Methods, 31(4), 611–623. [Gauss(1963)] Gauss, C. F. (1963). Theory of Motion of the Heavenly Bodies. Dover, New York. [Gazzah et Abed-Meraim(2003)] Gazzah, H. et Abed-Meraim, K. (2003). Blind sos-based zf equalization with controlled delay robust to order over estimation. Journal of Applied Signal Processing (IEE JASP). [Georgiadis(2000)] Georgiadis, A. (Sept. 2000). Adaptive Equalisation for Impulsive Noise Environments. Ph.D. thesis, The University of Edinburgh, Edinburgh, UK. [Ghogho et al.(1999)] Ghogho, M., Nandi, A. K., et Swami, A. (1999). Cramer-Rao bounds and maximum likelihood estimation for random amplitude phase–modulated signals. IEEE Transactions on Signal Processing, 47(11), 2905–2916. [Ghogho et al.(2001)] Ghogho, M., Swami, A., et Durrani, T. S. (2001). Frequency estimation in the presence of Doppler spread : performance analysis. IEEE Transactions on Signal Processing, 49(4), 777–789. [Gnedenko et Kolmogorov(1111)] Gnedenko, B. V. et Kolmogorov, A. N. (1111). Limit Distributions for Sums of Independent Random Variables. Addison-Wesley. [Godsil(1999)] Godsil, S. (1999). MCMC and EM-based methods for inference in heavy-tailed processes with α-stable innovations. In Proceedings of the IEEE Statistical Signal Processing Workshop. [Gonin et Money(1985)] Gonin, R. et Money, A. H. (1985). Nonlinear lp -norm estimation : Part 1. on the choice of the exponent, p, where the errors are additive. Commun. Stat. Theory Methods A, 14, 827–840. [Grenier(1984)] Grenier, Y. (1984). Modélisation de Signaux non Stationnaires. Ph.D. thesis, Université Paris Sud. [Griffith(1997)] Griffith, D. W. (1997). Robust-Time Frequency Representations for Signals in Alpha-Stable Noise : Methods and Applications. Ph.D. thesis, Departement of Electrical Engineering, University of Delaware, Newark. [Grigoriu(1995)] Grigoriu, M. (1995). Applied Non-Gaussian Processes. Prentice-Hall. [H. Hassanpour et Boashash(2003)] H. Hassanpour, M. M. et Boashash, B. (2003). Comparative performance of time-frequency based newborn EEG seizure detection using spike signature. In ICASSP’2003, volume 2, pages 389–392. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 198 BIBLIOGRAPHIE [Haas et Belfiore(1997)] Haas, R. et Belfiore, J.-C. (1997). A time-frequency well-localized pulse for multiple carrier transmission. Wireless Personal Communications, 5, 1–18. [Hall(1966)] Hall, H. M. (1966). A new model for impulsive phenomena : Application to atmospheric-noise communication channels. Technical Report 3412-8, 7050-7, Stanford Electronics Laboratories, Stanford University, Stanford, California. This paper introduce the Student-t distribution. [Hampel et al.(1986)] Hampel, F. R., Ronchetti, E., Rousseeuw, P. J., et Stahel, W. A. (1986). Robust Statistics : The Approach Based on Influence Functions. Wiley. [Hanssen et Oigard(2001)] Hanssen, A. et Oigard, T. A. (2001). The normal inverse Gaussian distribution as a flexible model for heavy-tailed processes. In Proceeding of NSIP. [Hérault et Ans(1984)] Hérault, J. et Ans, B. (1984). Circuits neuronaux à synapses modifiables : décodage de messages composites par apprentissage non supervisé. C.-R. de l’Académie des Sciences, 299(III-13), 525–528. [Hérault et al.(1985)] Hérault, J., Jutten, C., et Ans, B. (1985). Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. In Actes du Xème colloque GRETSI, pages 1017–1022, Nice, France. [Hlawatsch(1998)] Hlawatsch, F. (1998). Time-Frequency Analysis and Synthesis of Linear Signal Spaces : Time-Frequency Filters, Signal Detection and Estimation, and Range-Doppler Estimation. Kluwer Academic Publishers, USA. [Hlawatsch et Boudreaux-Bartels(1992)] Hlawatsch, F. et Boudreaux-Bartels, G. F. (1992). Linear and quadratic time-frequency signal representations. IEEE Signal Processing Magazine, 9(2), 21–67. [Hlawatsch et Krattenthaler(1997)] Hlawatsch, F. et Krattenthaler, W. (1997). Signal synthesis algorithms for bilinear time-frequency signal representations. In W. Mecklenbräuker et F. Hlawatsch, editors, The Wigner Distribution - Theory and Applications in Signal Processing, pages 135–209. Elsevier, Amsterdam, Netherlands. [Hlawatsch et Matz(1998)] Hlawatsch, F. et Matz, G. (1998). Time-frequency signal processing : A statistical perspective. In Proc. IEEE Workshop on Circuits, Systems and Signal Processing, pages 207–219, Mierlo, The Netherlands. [Hlawatsch et Matz(2000)] Hlawatsch, F. et Matz, G. (to appear in September 2000). Quadratic time-frequency analysis of linear time-varying systems. In L. Debnath, editor, Wavelet Transforms and Time-Frequency Signal Analysis, chapter 9. Birkhäuser, Boston (MA). [Hlawatsch et al.(2000)] Hlawatsch, F., Matz, G., Kirchauer, H., et Kozek, W. (2000). Time-frequency formulation, design, and implementation of time-varying optimal filters for signal estimation. IEEE Transactions on Signal Processing, 48. to appear. [Huber(1972)] Huber, J. P. (1972). Robust statistics : A review. Ann. Math. Statist., 43, 1041–1067. [Huber(1985)] Huber, P. (1985). Projection pursuit. The Annals of Statistics, 13(2), 435–475. [Huber(1981)] Huber, P. J. (1981). Robust Statistics. Wiley, New York. [Hussain(2002)] Hussain, Z. M. (2002). Adaptive instantaneous frequency estimation : Techniques and algorithms. Ph.D. thesis, Queensland University of Technology, Brisbane, Australia. [Hussain et Boashash(2002)] Hussain, Z. M. et Boashash, B. (2002). Adaptive instantaneous frequency estimation of multicomponent fm signals using quadratic time-frequency distributions. IEEE Trans. on Signal Proc., pages 1866–1876. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 199 [Hyvärinen(1997)] Hyvärinen, A. (1997). One-unit contrast functions for independent component analysis : A statistical analysis. In Neural Networks for Signal Processing VII (Proc. IEEE Workshop on Neural Networks for Signal Processing), pages 388–397, Amelia Island, Florida. [Hyvarinen(1998)] Hyvarinen, A. (1998). New approximation of differential entrpy for independent component analysis and projection pursuit. In advances in Neural Information Processing Systems, 10, 273–279. [Hyvarinen(1999)] Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. on Neural Networks, 10(3), 626 – 634. [Hyvarinen et al.(2001)] Hyvarinen, A., Karhunen, J., et Oja, E. (2001). Independent Component Analysis. Wiley. [Ichir et M.-Djafari(2003)] Ichir, M. et M.-Djafari, A. (2003). Bayesian wavelet based signal and image separation. In AIP Conference Proceedings of Maxent23 ; Maximum Entropy and Bayesian Inference Methods, pages 417–428, American Institute of Physics, Jackson Hole, Wyoming, USA. [Ikram et Zhou(2001)] Ikram, M. Z. et Zhou, G. T. (2001). Estimation of multicomponent polynomial phase signals of mixed orders. Signal Processing, 81, 2293–2308. [Ikram et al.(1996a)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996a). Estimating doppler parameters in SAR imaging for moving targets. In Proceeding of the IEEE Nordic Signal Processing Symposium (NORSIG), pages 207–210, Espoo, Finlande. [Ikram et al.(1996b)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996.b). Fast discrete quadratic phase transform for estimating the parameters of chirp signals. In Proc. of the 30th Asilomar Conference, CA, volume 1, pages 798–801. [Ikram et al.(1996c)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1996c). An iterative approach to the parametric estimation of chirp signals. In IEEE Region Ten Conference, Perth, Australia, volume 2, pages 681–685. [Ikram et al.(1997)] Ikram, M. Z., Abed-Meraim, K., et Hua, Y. (1997). Fast quadratic phase transform for estimating the parameters of multicomponent chirp signals. DSP Review Journal, pages 127–135. [Ikram et al.(1998)] Ikram, M. Z., Belouchrani, A., Abed-Meraim, K., et Gesbert, D. (1998). Parametric estimation and suppression of non-stationary interference in spread spectrum communications. In Proc. of 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, pages 1401–1405. [Ilow(1995)] Ilow, J. (1995). Signal Processing in α-stable Noise Environments : Noise Modeling, Detection and Estimation. Ph.D. thesis, Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Canada. [Jain(1989)] Jain, A. K. (1989). Fundamentals of Digital Image Processing. Prentice-Hall, Englewood Cliffs, New York. [Jakes(1974)] Jakes, W., editor (1974). Microwave Mobile Communications. IEEE Press. [Janicki et Weron(1994)] Janicki, A. et Weron, A. (1994). Simulation and Chaotic Behavior of α-Stable Stochastic Processes. Marcel Dekker, New York. [Jayant et Noll(1984)] Jayant, N. et Noll, P. (1984). Digital Coding of Waveforms : Principles and Applications to Speech and Video. Prentice-Hall. [Jones et Sibson(1987)] Jones, M. C. et Sibson, R. (1987). What is projection pursuit ? J. of the Royal Statistical Society, 150(A), 1–36. [Joshi et Morris(1998)] Joshi, S. M. et Morris, J. M. (1998). Multiple access based on Gabor transform. In Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pages 217–220, Pittsburgh, Pennsylvania, USA. IEEE. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 200 BIBLIOGRAPHIE [Jutten(2000)] Jutten, C. (2000). Source separation : from dusk till dawn. In Proc. 2nd Int. Workshop on Independent Component Analysis and Blind Source Separation (ICA’2000), pages 15–26, Helsinki, Finland. [Kagan et al.(1973)] Kagan, A., Linnik, Y., et Rao, C. (1973). Characterization Problems in Mathematical Statistics. John Wiley & Sons, USA. [Kalluri(1998)] Kalluri, S. (1998). Nonlinear Adaptive Optimization Algorithms for Robust Signal Processing in Non-Gaussian Environments. Ph.D. thesis, Dept. of Electrical Engineering, University of Delaware, Newark. [Kaluri et Arce(2000)] Kaluri, S. et Arce, G. (2000). Fast algorithms for weighted myriad algorithm computation by fixed point search. IEEE Trans. on Signal Proc. [Karol et al.(1997)] Karol, M. J., Haas, Z. J., Woodworth, C. B., et Gitlin, R. D. (1997). Time-frequency-code slicing : efficiently allocating the communications spectrum to multirate users. IEEE Transactions on Vehicular Technology, 46(4), 818–826. [Karvanen et Cichocki(2003)] Karvanen, J. et Cichocki, A. (2003). Measuring sparseness of noisy signals. In Proc. of the Conference ICA’2003, Japan. [Kassam(1995)] Kassam, S. A. (1995). Signal Detection in Non-Gaussian Noise. John Wily & Sons, New York. [Kassam et Poor(1985)] Kassam, S. A. et Poor, V. (1985). Robust techniques for signal processing : A survey. Proceedings of the IEEE, 73(3), 433–481. [Katkovnik(1998)] Katkovnik, V. (1998). Robust m-periodogram. IEEE Transaction on Signal Processing, 46(11), 3104–3109. [Katkovnik et Stankovic(1998)] Katkovnik, V. et Stankovic, L. J. (1998). Instantaneous frequency estimation using the Wigner distribution with varying and data driven window length. IEEE Transactions on Signal Processing, 46(9), 2315–2325. [Katkovnik et al.(2002)] Katkovnik, V., Djurovic, I., et Stankovic, L. (2002). Time-Frequency Signal Analysis, chapter Robust time-frequency representations. Prentice-Hall. [Katkovnik et al.(2003)] Katkovnik, V., Djurovic, I., et Stankovic, L. (2003). Robust time-frequency representation. Elsevier, Oxford. [Kay(1998a)] Kay, S. (1998a). Fundamentals of Statistical Signal Processing : Detection Theory. Prentice-Hall, Englewood Cliffs. [Kay(1998b)] Kay, S. (1998b). Fundamentals of Statistical Signal Processing : Estimation Theory. Prentice-Hall, Englewood Cliffs. [Kay(1993)] Kay, S. M. (1993). Fundamentals of Statistical Signal Processing : Estimation Theory. A.V. Oppenheim, series editor, Prentice-Hall Signal Processing Series. Prentice-Hall, Englewood Cliffs, New Jersey. [Kay(1998c)] Kay, S. M. (1998c). Fundamentals of Statistical Signal Processing, Volume II : Detection Theory. A.V. Oppenheim, series editor, Prentice-Hall Signal Processing Series. Prentice-Hall. [Kay et Boudreaux-Bartels(1985)] Kay, S. M. et Boudreaux-Bartels, G. F. (1985). On the optimality of Wigner distribution for detection. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’85, pages 1017–1019. [Khawarizmi(ecle)] Khawarizmi, M. I. M. (IX-ème siècle). The algebra of Mohammed ben Musa. Edited band translated by Frederic Rosen. Georg Olms. Verlag. [Kidmose(2001)] Kidmose, P. (2001). Blind Separation of Heavy Tail Signals. Ph.D. thesis, Technical University of Denmark, Lyngby, Denmark. [Knuth(1999)] Knuth, K. H. (1999). A Bayesian approach to source separation. In Proceeding of the first International Workshop on Independent Component Analysis and Signal Separation (ICA’1999), pages 283–288, Aussios, France. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 201 [Kootsookos et al.(1992)] Kootsookos, P., Lovell, B., et Boashash, B. (1992). A unified approach to the STFT, TFDs, and instantaneous frequency. IEEE Transactions on Signal Processing, 40, 1971 – 1982. [Koutrouvelis(1980)] Koutrouvelis, I. A. (1980). Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association, 75(372), 918–928. [Krim et Viberg(1996)] Krim, H. et Viberg, M. (1996). Two decades of array signal processing research : the parametric approach. IEEE Signal Processing Magazine, 13(4), 67 – 94. [Krob et Benidir(1993)] Krob, M. et Benidir, M. (1993). Blind idenntification of a linear-quadratic model using higher-order statistics. Minneapolis, USA. [Kuelbs(1973)] Kuelbs, J. (1973). A representation theorem for symmetric stable processes and stable measures. H. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 26. [Kuhn et Lavielle(2004)] Kuhn, E. et Lavielle, M. (2004). Coupling a stochastic approximation version of EM with a MCMC procedure. ESAIM Proba.& Stat., 8, 115–131. [Kuruoglu(1998)] Kuruoglu, E. (1998). Signal Processing in α-stable Noise Environments : A least lp Approach. Ph.D. thesis, University of Cambridge, UK. [Kuruoglu(2001)] Kuruoglu, E. E. (2001). Density parameter estimation of skewed α-stable distributions. Transaction on Signal Processing, 49(10). [Kuruoglu(2002)] Kuruoglu, E. E. (2002). Nonlinear least lp -norm filters for nonlinear autoregressive α-stable processes. Digital Signal Processing, 12, 119–142. [Kuruoglu(2003)] Kuruoglu, E. E. (2003). Analytical representation for positive α-stable densities. In Proceedings of ICASSP 2003 ; IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 6, pages 729–732. [Lacoume et Ruiz(1988)] Lacoume, J.-L. et Ruiz, P. (1988). Sources identification : a solution based on cumulants. In Proc. IEEE ASSP Workshop, Minneapolis, Minnesota. [Launer et Wilkinson(1979)] Launer, R. L. et Wilkinson, G. N., editors (1979). Robustness in Statistics. Academic Press, The Army Research Office, Research Triangle Park, North Carolina, USA. This book contains the proceedings of a workshop. [Lecoutre et Tassi(1980)] Lecoutre et Tassi, P. (1980). Statistique non paramétrique et robustesse. statistica, Paris. [Lee(1998a)] Lee, T.-W. (1998a). Independent Component Analysis : Theory and Applications. Kluwer Academic, Boston/ Dordrecht/ London. [Lee(2001)] Lee, T.-W. (2001). Independent Component Analysis : Theory and Applications. Kluwer Academic Publishers, Boston. [Lee et al.(1999)] Lee, T.-W., Lewicki, M. S., et Girolami, M. (1999). Blind source separation of more sources than mixtures using overcomplete representations. Signal Processing Letters, 6(4). [Lee(1998b)] Lee, W. Y. (1998b). Mobile communications engineering. McGraw-Hill, 2nd. edition. [Leroy(1987)] Leroy, P. J. R. A. M. (1987). Robust Regression & Outlier Detection. John Wiley & Sons. [Lévy(1925)] Lévy, P. (1925). Calcul des Probabilités. Gauthier-Villars, Paris. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 202 BIBLIOGRAPHIE [Leyman et al.(2000)] Leyman, A. R., Kamran, Z. M., et Abed-Meraim, K. (2000). Higher order time frequency based blind source separation technique. IEEE Signal Processing Letters. [Linh-Trung(2002)] Linh-Trung, N. (2002). Estimation and separation of LFM Signals in wirless communication using time-frequency signal processing. Ph.D. thesis, Queensland University of Technology, Brisbane, Australia. [Luengo et al.(2003)] Luengo, D., Santamaria, I., Vielva, L., et Pantaleon, C. (2003). Underdetermined blind separation of sparse sources with instantaneous and convolutive mixtures. In Proceedings of the IEEE XIII-th Workshop on Neural Networks for Signal Processing. [Luigi et Moreau(2002a)] Luigi, C. D. et Moreau, E. (2002a). An iterative algorithm for the estimation of linear frequency modulated signal parameters. IEEE Signal Processing Letters, 9(4), 127–129. [Luigi et Moreau(2002b)] Luigi, C. D. et Moreau, E. (2002b). Wigner-ville and polynomial wigner-ville transforms in the estimation of nonlinear FM signal parameters. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 1433–1436, Orlando, Florida. [Luo et al.(2004)] Luo, Y., Lambotharan, S., et Chambers, J. (2004). A new block based time-frequency approach for underdetermined blind source separation. In Proceedings of ICASSP ’04 ; IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 537–540. [M. Castella et Pesquet(2004)] M. Castella, E. M. et Pesquet, J.-C. (2004). A quadratic MISO contrast for blind equalization. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’2004, Montréal, Canada. [M. Sahmoudi et al.(2005)] M. Sahmoudi, K. A.-M., Lavielle, M., Kunth, E., et Ciblat, P. (2005). Blind source separation using a semi-parametric approach with application to heavy-tailed signals. In submitted to EUSIPCO’2005. [Ma et Nikias(1995a)] Ma, X. et Nikias, C. L. (1995a). On blind channel identification for impulsive signal environments. In Proc. of the Conference ICASSP’1995. [Ma et Nikias(1995b)] Ma, X. et Nikias, C. L. (1995b). Parameter estimation and blind channel identification in impulsive signal environments. Transaction on Signal Processing, 43(12). [Mandelbrot(1962)] Mandelbrot, B. (1962). Sur certains prix spéculatifs : faits empiriques et modèle basé sur les processus stables additifs non gaussiens de paul lévy. Comptes rendus à l’Académie des Sciences, 254, 3968–9370. [Mandelbrot(1963)] Mandelbrot, B. (1963). The variation of certain speculative prices. Journal of busniness, 36, 394–419. [Mansour et Ohnishi(2000)] Mansour, A. et Ohnishi, N. (2000). Discussion of simple algorithms and methods to separate non-stationary signals. In Fourth IASTED International Conference On Signal Processing and Communications (SPC 2000), pages 78–85, Marbella, Spain. [Mansour et al.(2000a)] Mansour, A., Jutten, C., et Loubaton, P. (2000a). Adaptive subspace algorithm for blind separation of independent sources in convolutive mixture. IEEE Trans. on Signal Processing, 48(2), 583–586. [Mansour et al.(2000b)] Mansour, A., Barros, A. K., et Ohnishi, N. (2000b). Blind separation of sources : Methods, assumptions and applications. Special Issue on Digital Signal Processing in IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E83-A(8), 1498 – 1512. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 203 [Mansour et al.(2001)] Mansour, A., Puntonet, C. G., et Ohnishi, N. (2001). A simple ICA algorithm based on geometrical approach. In Sixth International Symposium on Signal Processing and its Applications (ISSPA 2001), pages 9–12, Kuala-Lampur, Malaysia. [Mansour et al.(2002a)] Mansour, A., Ohnishi, N., et Puntonet, C. G. (2002a). Blind multiuser separation of instantaneous mixture algorithm based on geometrical concepts. Signal Processing, 82(8), 1155–1175. [Mansour et al.(2002b)] Mansour, A., Kawamoto, M., et Ohnishi, N. (2002b). A survey of the performance indexes of ica algorithms. In 21st IASTED International Conference on Modelling, Identification and Control (MIC 2002), pages 660 – 666, Innsbruck, Austria. [Marinovic(1984)] Marinovic, N. (1984). Time-Frequency Analysis. Ph.D. thesis. [Marple S.L.(2001)] Marple S.L., J. (2001). Large dynamic range time-frequency signal analysis with application to helicopter doppler radar data. In Sixth International, Symposium on. Signal Processing and its Applications, volume 1, pages 260–263. [Martin(1982)] Martin, W. (1982). Time-frequency analysis of random signals. In Proceeding of ICASSP’1982, pages 1325–1328, Paris, France. [Martin et Flandrin(1985)] Martin, W. et Flandrin, P. (1985). Wigner–Ville spectral analysis of non–stationary signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6), 1461–1470. [Masry et Cambanis(1984)] Masry, E. et Cambanis, S. (1984). Spectral density estimation for stationary stable processes. Stochastic Processes and their Applications, 18, 1–31. [Matz et Hlawatsch(1998a)] Matz, G. et Hlawatsch, F. (1998a). Extending the transfer function calculus of time-varying linear systems : A generalized underspread theory. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’98, pages 2189–2192, Seattle, WA, USA. IEEE. [Matz et Hlawatsch(1998b)] Matz, G. et Hlawatsch, F. (1998b). Time-frequency transfer function calculus (symbolic calculus) of linear time-varying systems (linear operators) based on a generalized underspread theory. Journal of Mathematical Physics, 39(8), 4041–4070. [Matz et Hlawatsch(1999)] Matz, G. et Hlawatsch, F. (1999). Time-frequency subspace detectors and application to knock detection. Int. J. Electron. Commun. (AEÜ), 53(6), 379–385. [Matz et Hlawatsch(2003)] Matz, G. et Hlawatsch, F. (2003). Wigner distribution (nearly) everywhere : time-frequency analysis of signals, systems, random processes, signal spaces, and frames. Signal Processing (Elsevier), 83, 1355–1378. [Matz et al.(1999)] Matz, G., Molisch, A. F., Steinbauer, M., Hlawatsch, F., Gaspard, I., et Artés, H. (1999). Bounds on the systematic measurement errors of channel sounders for time-varying mobile radio channels. In Proc. IEEE VTC-99 Fall, pages 1465–1470, Amsterdam, Netherlands. [Maymon et al.(2000)] Maymon, S., Friedmann, J., et Messer, H. (2000). A new methode for estimating parameters of a skewed alpha-stable distribution. In IEEE Conference. [McCullagh(1987)] McCullagh, P. (1987). Tensor Methods in Statistics. Monographs on Statistics and Probability, Chapman and Hall. [McGillem et Cooper(1984)] McGillem, C. et Cooper, G. (1984). Continuous and Discrete Signal and System Analysis. HRW Series in Electrical and Computer Engineering. CBS Publishing Japan Ltd., 2nd edition. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 204 BIBLIOGRAPHIE [McGillem et Cooper(1991)] McGillem, C. et Cooper, G. (1991). Continuous and Discrete Signal and System Analysis. HRW Series in Electrical and Computer Engineering. Saunders College Publishing, 3rd edition. [McHale et Boudreaux-Bartels(1993)] McHale, T. J. et Boudreaux-Bartels (1993). An algorithm for synthesizing signals from partial time-frequency models using the cross Wigner distribution. IEEE Transactions on Signal Processing, 41(5), 1986–1990. [Mecklenbräuker et Hlawatsch(1997)] Mecklenbräuker, W. et Hlawatsch, F., editors (1997). The Wigner Distribution – Theory and Applications in Signal Processing. Elsevier, Amsterdam, Netherlands. [Meng et Rubin(1993)] Meng, X. L. et Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm : a general framework. biometrika, 80(2), 267–278. [Michael(1983)] Michael, J. R. (1983). The stabilized probability plot. Biometrika, 70, 11–17. [Middleton(1977)] Middleton, D. (1977). Statistical-physical models of electromagnetic interference. IEEE Trans. on Electromagnetic Compatibility, EMC-19(3), 106–127. [Miller(1978)] Miller, G. (1978). Properties of certain symmetric stables distributions. Journal of Multivariate Analysis, 8(3), 346–360. [Milstein(1988)] Milstein, L. B. (1988). Interference rejection techniques in spread spectrum communications. Proceedings of the IEEE, pages 657–671. [Mirza et Boyer(1993)] Mirza, M. J. et Boyer, K. L. (1993). Performance evaluation of a class of M-estimators for surface parameter estimation in noisy range data. IEEE Trans. on Robotics and Automation, 9(1), 75–85. [Miskin(2000)] Miskin, J. (2000). Ensemble Learning for Independent Component Analysis. Ph.D. thesis, Cambridge, http ://www.infernce.phy.cam.ac.uk/jwm1003/. [Molgedey et Schuster(1994)] Molgedey, L. et Schuster, H. G. (1994). Separation of a mixture of independent signals using time delayed correlations. Physical Review Letters, 72, 3634–3636. [Moreau(2000)] Moreau, E. (2000). Joint-diagonalization of cumulant tensors and source separation. In Proceedings of the 10th IEEE Signal Processing Workshop on Statistical Signal and Array Processing (SSAP 2000), pages 339–343, Pocono Manor, Pennsylvanie, USA. [Moreau(2001)] Moreau, E. (2001). A generalization of joint-diagonalization criteria for source separation. IEEE Transactions on Signal Processing, 49(3), 530–541. [Moreau et Macchi(1996)] Moreau, E. et Macchi, O. (1996). High order contrasts for self-adaptive source separation. International Journal of adaptive Control and Signal Processing, 10(1), 19–46. [Moreau et Pesquet(1997)] Moreau, E. et Pesquet, J.-C. (1997). Generalized contrasts for multichannel blind deconvolution of linear systems. IEEE Signal Processing Letters, 4, 182–183. [Moreau et Stoll(1999)] Moreau, E. et Stoll, B. (1999). An iterative block procedure for the optimization of constrained contrast function. In Proceedings of the International Conference on Independent Component Analysis (ICA’99), pages 59–64, Aussois, France. [Morelande et Zoubir(2002)] Morelande, M. R. et Zoubir, A. M. (2002). Model selection of random amplitude polynomial phase signals. IEEE Transactions on Signal Processing, 50(3), 578–589. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 205 [Morelande et al.(2000)] Morelande, M. R., Barkat, B., et Zoubir, A. M. (2000). Statistical performance comparison of a parametric and a non–parametric method for IF estimation of random amplitude linear FM signals in additive noise. In Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing, pages 262–266. [Moussaoui et al.(2004)] Moussaoui, S., Brie, D., Caspary, O., et M.-Djafari, A. (2004). A bayesian method for positive source separation. In Proceeding of ICASSP ’2004, volume 5. [Nandi(1999)] Nandi, A. K., editor (1999). Blind Estimation Using Higher-Order Statistics. Boston : Kluwer Academic Publishers. [Nguyen et al.(2001a)] Nguyen, L., Belouchrani, A., Abed-Meraim, K., et Boashash, B. (2001a). Separating more sources than sensors using time-frequency distributions. In In Proc. of Int. Symposium on Signal Processing and its Applications (ISSPA’2001), pages 583–586, Malaysia. [Nguyen et al.(2001b)] Nguyen, L.-T., Senadji, B., et Boashash, B. (2001b). Scattering function and time-frequency signal processing. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’2001, volume VI, pages 3597–3600, Salt Lake city, Utah, USA. [Nikias et Petropulu(1993)] Nikias, C. et Petropulu, A. (1993). Higher–Order Spectra Analysis : A Nonlinear Signal Processing Framework. Prentice-Hall. [Nikias et Petropulu(1994)] Nikias, C. L. et Petropulu, A. P. (1994). Higher-order Spectra Analysis : A Nonlinear Signal Processing Framework. Prentice Hall, New York. [Nikias et Shao(1995)] Nikias, C. L. et Shao, M. (1995). Signal Processing with Alpha-Stable Distributions and Applications. John Wiley & Sons, New York. [Nolan(2004)] Nolan, J. P. (2004). Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston. [Nowicka(1997)] Nowicka, J. (1997). Asymptotic behavior of the covariation and the codifference for ARMA models with stable innovations. Communications in Statistics. Stochastic Models, 13(4), 673–685. [Ouldali(1999)] Ouldali, A. (1999). Modélisation statistique et identification des signaux FM à phase polynomiale. Ph.D. thesis, LSS, Supélec–Univ Paris XI, France. [Ouldali et Benidir(1999)] Ouldali, A. et Benidir, M. (1999). Statistical analysis of polynomial phase signals affected by multiplicative and additive noise. Signal Processing, 42(19). [P. Bickel(1998)] P. Bickel, e. a. (1998). Efficient and Adaptive Estimation for Semiparametric Models. Springer. [P.-Y. Arquès(2000)] P.-Y. Arquès, N. T.-M. e. E. M. (2000). Techniques de l’ingénieur, Traité Mesure et Contrôle, volume RAB, chapter Les représentations temps-fréquences linéaires et quadratiques en traitement du signal, pages 1–22. Techniques de l’ingénieur. [Papoulis(1991)] Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes. McGraw-Hill. [Peleg et Friedlander(1995)] Peleg, S. et Friedlander, B. (1995). The discrete polynomial-phase transform. IEEE Transactions on Signal Processing, 43(8), 1901–1914. [Peleg et Friedlander(1996)] Peleg, S. et Friedlander, B. (1996). Multicomponent signal analysis using the polynomial-phase transform. IEEE Trans. On AES. [Pesquet et Moreau(2001)] Pesquet, J.-C. et Moreau, E. (2001). Cumulant based independence measures for linear mixtures. IEEE Transactions on Information Theory, 47(5), 1947–1956. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 206 BIBLIOGRAPHIE [Pham(1999)] Pham, D. T. (1999). Mutual information approach to blind separation of stationary sources. [Pham(2000)] Pham, D. T. (2000). Blind separation of instantaneous mixture of sources via order statistics. IEEE Transaction on Signal Processing, 48(2), 363–375. [Pham et Cardoso(2001)] Pham, D. T. et Cardoso, J. F. (2001). Blind separation of instantanoeus mixtures of nonstationary sources. IEEE Transaction on Signal Processing, 49(9), 1837–1848. [Pham et Garrat(1997)] Pham, D.-T. et Garrat, P. (1997). Blind separation of a mixture of independent sources through a quasi-maximum likelihood approach. IEEE Transaction on Signal Processing, 45(7), 1712–1725. [Piasco et al.(1995)] Piasco, J. M., Elkarkour, W., et Guglielmi, M. (1995). Identifiction paramétriques de différents modèles d’un signal M.L.F. multicomposantes. In Quinzième colloque GRETSI, pages 193–196, Juan les Pins. [Poor et Tanda(2002)] Poor, H. et Tanda, M. (2002). Multiuser detection in flat fading non-gaussian channels. IEEE Transactions on Communications, 50(11), 1769–1777. [Poor et Wornell(1998)] Poor, H. V. et Wornell, G. W., editors (1998). Wireless Communications : Signal Processing Perspectives. Prentice-Hall, New Jersey. [Proakis(1995)] Proakis, J. G. (1995). Digital Communications. McGraw–Hill, 3rd. edition. [Rachev(2003)] Rachev, S. T. (2003). Handbook of Heavy Tailed Distributions in Finance. Elsevier, amsterdam. [Rai et Singh(2004)] Rai, C. S. et Singh, Y. (2004). Source distribution models for blind source separation. Neurocomputing, 57, 501–505. [Rappaport(1996)] Rappaport, T. S. (1996). Wireless Communications : Principles and Practice. Prentice-Hall, New Jersey. [Rihaczek(1985)] Rihaczek, A. (1985). Principles of High-Resolution Radar. Peninsula Publishing. [Ristic(1995)] Ristic, B. (1995). Some aspects of signal dependent and higher-order time-frequency and time-scale analysis of non-stationary signals. Ph.D. thesis, Signal Processing Research Centre, Queensland University of Technology, Brisbane, Australia. [Rupi et al.(2004)] Rupi, M., Tsakalides, P., Re, E. D., et Nikias, C. L. (2004). Constant modulus blind equalization based on fractional lower-order statistics. Signal Processing, 84, 881–894. [Sahmoudi(2005)] Sahmoudi, M. (2005). Generalized contrast functions for blind source separation with unknown number of sources. In IEEE Statistical Signal Processing Workshop (SSP’2005) (submitted), Bordeaux, France. [Sahmoudi et Abed-Meraim(2004a)] Sahmoudi, M. et Abed-Meraim, K. (2004,a). Multicomponent chirp interference estimation for communication systems in impulsive alpha-stable noise environment. In Proceeding of the IEEE International Symposium on Control, Communications and Signal Processing (ISCCSP’04), Hammamet, Tunisia. [Sahmoudi et Abed-Meraim(2004b)] Sahmoudi, M. et Abed-Meraim, K. (2004b). Robust blind separation algorithms for heavy-tailed sources. In Proceeding of the IEEE International Symposium on Signal Processing and Information Theory, Rome, Italy. [Sahmoudi et al.(2002)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2002). Blind separation of alpha-stable sources : A new fractional lower-order moments (FLOM) approach. In Prooceding of the IEEE International Symposium in Signal Processing and Information Theory (ISSPIT’2002). M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 207 [Sahmoudi et al.(2003a)] Sahmoudi, M., bed Meraim, K., et Benidir, M. (2003a). Blind separation of instantaneous mixtures of impulsive α-stable sources. In Proceeding of the IEEE International Symposium on Signal and Image Processing (ISPA’2003). [Sahmoudi et al.(2003b)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2003b). Estimation des signaux chirp multi-composantes affectés par un bruit impulsif α-stable. In Proceeding of GRETSI’2003. [Sahmoudi et al.(2004a)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2004a). Blind separation of heavy-tailed signals using normalized statistics. In Proceeding of ICA’2004, Granada, Spain. [Sahmoudi et al.(2004b)] Sahmoudi, M., Abed-Meraim, K., et Barkat, B. (2004b). IF estimation of multicomponent chirp signal in impulsive α-stable noise environment using parametric and non-parametric approaches. In Proceedings of EUSIPCO’2004, Austria. [Sahmoudi et al.(2005)] Sahmoudi, M., Abed-Meraim, K., et Benidir, M. (2005). Blind separation of impulsive alpha-stable sources using minimum dispersion criterion. IEEE Signal Processing Letters. [Samorodnitsky et Taqqu(1994)] Samorodnitsky, G. et Taqqu, M. (1994). Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance. Chapman & Hall, New York. [Sarni et al.(2001)] Sarni, Y., Sadoun, R., et Belouchrani, A. (2001). On the application of chirp modulation in spread spectrum communication systems. In Proccedings of ISSPA’2001 ; Sixth International, Symposium on Signal Processing and its Applications, volume 2, pages 501 – 504. [Sayeed(1998)] Sayeed, A. M. (1998). Canonical time-frequency processing for broadband signaling over dispersive channels. In Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pages 369–372, New York, USA. IEEE. [Sayeed et al.(1998)] Sayeed, A. M., Sendonaris, A., et Aazhang, B. (1998). Multiuser detection in fast-fading multipath environments. IEEE Journal on Selected Areas in Communications, 16(9), 1691–1701. [Schilder(1970)] Schilder, M. (1970). Some structure theorems for the symmetric stable laws. Ann. Math. Statist., 41(2), 412–421. [Senecal(2002)] Senecal, S. (2002). Méthodes de simulation Monte-Carlo par chaı̂nes de Markov pour l’estimation de modèle. Application en séparation de sources et en égalisation. Ph.D. thesis, INPG, Grenoble. [Sengupta et Burman(2003)] Sengupta, K. et Burman, P. (2003). Non-parametric approach to ICA using Kernel Density Estimation. In Proceedings of IEEE International Conference on Multimedia and Expo. ICME’03, volume 1, pages 749–752. [Serfling(1980)] Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley. [Shamsunder et al.(1995)] Shamsunder, S., Giannakis, G., et Friedlander, B. (1995). Estimating random amplitude polynomial phase signals : a cyclostationary approach. IEEE Trans. n Signal processing, 43(2), 492–505. [Shannon(1948a)] Shannon, C. E. (1948a). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. [Shannon(1948b)] Shannon, C. E. (1948b). A mathematical theory of communication. The Bell System Technical Journal, 27, 623–657. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 208 BIBLIOGRAPHIE [Shereshevski(2002)] Shereshevski, Y. (Mars, 2002). Blind signal separation of heavy tail sources. M.Sc. thesis, Tel Aviv University, Israel. [Shereshevski et al.(2001)] Shereshevski, Y., Yeredor, A., et Messer, H. (2001). Super-efficiency in blind signal separation of symmetric heavy-tailed sources. In Proceedings of 11th IEEE Workshop on Statistical Signal Processing, pages 78 –81. [Shi et al.(2004)] Shi, Z., Tang, H., Liu, W., et Tang, Y. (2004). Blind source separation of more sources than mixtures using generalized exponential mixture models. Neurocomputation, 61, 461–469. [Shiryayev(1984)] Shiryayev, A. N. (1984). Probability. In Graduate Texts in Mathematics, volume 95. Springer-Verlag. [Snoussi(2003)] Snoussi, H. (2003). Approche Bayésienne en Séparation de Sources. Applications en Imagerie. Ph.D. thesis, Université Paris-Sud Orsay, Paris. [Snoussi et M.-Djafari(2000)] Snoussi, H. et M.-Djafari, A. (2000). Bayesian source separation with mixture of gaussians prior for sources and Gaussian prior for mixture coefficients. In Proc. of MaxEnt ; Bayesian Inference and Maximum Entropy Methods, pages 388–406, Gif-sur-Yvette, FRANCE. [Snoussi et M.-Djafari(2004)] Snoussi, H. et M.-Djafari, A. (2004). Fast joint separation and segmentation of mixed images. Journal of Electronic Imaging, 13(2), 349–361. [Stankovic(1997)] Stankovic, L. (1997). S–class of time–frequency distributions. IEE Proc. Vision, Image and Signal Processing, 144(2), 57–64. [Stankovic et Stankovic(1993)] Stankovic, L. et Stankovic, S. (1993). Wigner distribution of noisy signals. IEEE Transactions On Signal Processing, 41(2), 956–960. [Stankovic et Katkovnik(1998)] Stankovic, L. J. et Katkovnik, V. (1998). Algorithm for the instantaneous frequency estimation using the time–frequency distributions with adaptive window length. IEEE Signal Processing Letters, 5(9). [Stoll et Moreau(2000)] Stoll, B. et Moreau, E. (2000). A generalized ICA algorithm. IEEE Signal Processing Letters, 7(4), 90–92. [Stone(1990)] Stone, C. J. (1990). Large-sample inference for log-spline models. Ann. statist., 18(2), 717–741. [Stuck(1977)] Stuck, B. W. (1977). Minimum error dispersion linear filtering of scalar symmetric stable processes. IEEE Trans. on Automatic Control, (23), 507–509. [Stuck et Kleiner(1974)] Stuck, B. W. et Kleiner, B. (1974). A statistical analysis of telephone noise. Bell System Technical Journal, (53), 1263–1320. [Subbotin(1923)] Subbotin, M. T. (1923). On the law of frequency of errors. Mathematicheskii Sbornik, 31, 296–301. [Sucic et al.(1999)] Sucic, V., Barkat, B., et Boashash, B. (1999). Performance evaluation of the B distribution. In Proceedings of the Fifth International Symposium on Signal Processing and its Applications, ISSPA’99, volume 1, pages 267–270, Brisbane, Queensland, Australia. [Suppappola(2003)] Suppappola, A. P., editor (2003). Applications in Time-Frequency Signal Processing. CRC PRESS. [Swami et Sadler(1998)] Swami, A. et Sadler, B. (1998). Parameter estimation for linear alpha-stable processes. IEEE Signal Processing Letters, 5(2). [Swarts et al.(1999)] Swarts, F., van Rooyan, P., Oppermann, I., et Lotter, M. P., editors (1999). CDMA Techniques for Third Generation Mobile Systems. Kluwer Academic Publishers, Boston. [Takada(2001)] Takada, T. (2001). Nonparametric density estimation : A comparative study. Economics Bulletin, 3(16), 1–10. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires BIBLIOGRAPHIE 209 [Taleb(1999)] Taleb, A. (1999). Séparation de Sources dans des Mélanges Non Linéaires. Ph.D. thesis, INPG, Grenoble, France. [Thirion-Moreau et al.(2004)] Thirion-Moreau, N., Fadili, E., et Moreau, E. (2004). A sufficient condition for separation of deterministic signals based on spatial time-frequency representation. In Proceedings of the International Conference on Independent Component Analysis (ICA’2004), pages 366–373. [Tong et al.(1991)] Tong, L., Liu, R.-W., Soon, V., et Huang, Y.-F. (1991). Indeterminacy and identifiability of blind identification. IEEE Trans. on Circuits and Systems, 38, 499–509. [Tourneret(1998)] Tourneret, J. (1998). Detection and estimation of abrupt changes contaminated by multiplicative Gaussian noise. Signal Processing, 68, 259–270. [Tourneret et al.(2003a)] Tourneret, J. Y., Doisy, M., et Lavielle, M. (2003a). Bayesian retrospective detection of multiple change-points corrupted by multiplicative noise application to SAR image edge detection. Signal Processing, 83, 1871–1887. [Tourneret et al.(2003b)] Tourneret, J.-Y., Suparman, S., et Doisy, M. (2003b). HierarchicalBayesian segmentation of signals corrupted by multiplicative noise. In Proceeding of ICASSP’2003, pages 165–168, Hong-Kong, China. [Tsakalides et Nikias(1996)] Tsakalides, P. et Nikias, C. (1996). The robust covariation-based MUSIC (ROC-MUSIC) algorithm for bearing estimation in impulsive noise environmentts. Trans. on Signal Processing, 44(7), 1623–1633. [Tsihrintzis et Nikias(1996)] Tsihrintzis, G. et Nikias, C. (1996). Fast estimation of the parameters of alpha-stable impulsive interference. 44(6). [VanTrees(1968)] VanTrees, H. L. (1968). Detection, Estimation, and Modulation Theory : Part I. John Wiley & Sons. [VanTrees(1992)] VanTrees, H. L. (1992). Detection, Estimation, and Modulation Theory. Radar-Sonar Signal Processing and Gaussian Signals in Noise. Krieger Pub. Co., Malabar, Florida. [Ville(1948)] Ville, J. (1948). Théorie et applications de la notion de signal analytique. Cables et Transmissions, 2A(1), 61–74. [Vincent(1995)] Vincent, I. (1995). Classification de Signaux non Stationnaires. Ph.D. thesis, Université de Nantes/Ecole Centrale de Nantes. [Walter(1994)] Walter, C. (1994). Les structures du hasard en économie : efficience des marchés, lois stables et processus fractales. Ph.D. thesis, IEP Paris. [Wang et al.(2002)] Wang, Y., Gao, L., Zhao, M., Chen, J., Zhang, Z., et Yao, Y. (2002). Time-frequency code for multicarrier DS-CDMA systems. In Proceeding of the IEEE 55th Vehicular Technology Conference, volume 3, pages 1224–1227. [Wegman et al.(1989)] Wegman, E. J., Schwartz, S. G., et Thomas, J. (1989). Topics in non-Gaussian Signal Processing. Academic Press, New York. [White et Boashash(1988)] White, L. V. et Boashash, B. (1988). On estimating the instantaneous frequency of a Gaussian random signal by use of the Wigner–Ville distribution. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(3), 417–420. [Wood et Barry(1994)] Wood, J. C. et Barry, D. T. (1994). Linear signal synthesis using the Radon-Wigner transform. IEEE Transactions on Signal Processing, 42(8), 2105–2111. [Xueshi Yang et Pesquet(2001)] Xueshi Yang, A. P. P. et Pesquet, J. C. (2001). Estimating long-range dependence in impulsive traffic flows. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’01), pages 3413 – 3416. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires 210 BIBLIOGRAPHIE [Zhang et Amin(2000)] Zhang, Y. et Amin, M. G. (2000). Blind separation of sources based on their time-frequency signatures. In Proceedings. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’00, volume 5, Istanbul, Turkey. [Zhang et Kassam(2004)] Zhang, Y. et Kassam, S. A. (2004). Robust rank-EASI algorithm for blind source separation. IEE Proc. Commun., 151(1), 15–19. [Zhang et al.(2001)] Zhang, Y., Ma, W., et Amin, M. G. (2001). Subspace analysis of spatial time–frequency distribution matrices. IEEE Transactions on Signal Processing, 49(4), 747–759. [Zhao et al.(1990)] Zhao, Y., Atlas, L. E., et Marks, R. J. (1990). The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals. IEEE Trans. on Acoustics, Speech, and Signal Processing, 38(7), 1084–1091. [Zhong et al.(2004a)] Zhong, M., tang, H., et Tang, Y. (2004a). Expectation-Maximization approaches to independent component analysis. Neurocomputing, 61, 503–512. [Zhong et al.(2004b)] Zhong, M.-J., Tang, H.-W., Chen, H.-J., et Tang, Y.-Y. (2004b). An EM algorithm for learning sparse and overcomplete representations. Neurocomputation, 57, 469–476. [Zhou et Giannakis(1994a)] Zhou, G. et Giannakis, G. (1994a). Self coupled harmonics : stationary and cyclostationary approaches. In International Conference on Acoustics, Speech, and Signal Processing, ICASSP’94, volume 4, pages IV/153–156, Adelaide, SA, Australia. IEEE. [Zhou et Giannakis(1995)] Zhou, G. et Giannakis, G. (1995). Harmonics in Gaussian multiplicative and additive noise : Cramer-Rao bounds. IEEE Trans. on Signal Proc., 43(5), 1217–1231. [Zhou et Giannakis(1996)] Zhou, G. et Giannakis, G. (1996). Polyspectral analysis of mixed processes and coupled harmonics. IEEE Transactions on Information Theory, 42(3), 943–958. [Zhou et Giannakis(1993)] Zhou, G. et Giannakis, G. B. (1993). Comparison of higher-order and cyclic approaches for estimating random amplitude modulated harmonics. In IEEE Signal Processing Workshop on Higher-Order Statistics, pages 225–229, South Lake Tahoe, CA, USA. [Zhou et Giannakis(1994b)] Zhou, G. et Giannakis, G. B. (1994b). On estimating random amplitude-modulated harmonics using higher order spectra. IEEE Journal of Oceanic Engineering, 19(4), 529–539. [Zhou et al.(1996)] Zhou, G., Giannakis, G., et Swami, A. (1996). On polynomial phase signals with time-varying amplitudes. IEEE Trans. on Signal Proc., 44(4), 848–861. [Ziehe et Müller(1998)] Ziehe, A. et Müller, K.-R. (1998). TDSEP—an efficient algorithm for blind separation using time structure. In Proc. Int. Conf. on Artificial Neural Networks (ICANN’98), pages 675–680, Skövde, Sweden. [Zolotarev(1966)] Zolotarev, V. (1966). On representation of stable laws by integrals. In Selected Translations in Mathematical Statistics and Probability, volume 6, pages 84–8. American Mathematical Society. [Zolotarev(1986)] Zolotarev, V. M. (1986). One-dimentional stable distribution. In Translations of Mathematical Monographs, volume 65. American Mathematical Society. [Zoubir et Brcich(2002)] Zoubir, A. et Brcich, R. (2002). Multiuser detection in non-gaussian channels. Digital Signal Processing, 12, 262–273. [Zoubir et Arnold(1996)] Zoubir, A. M. et Arnold, M. J. (1996). Testing Gaussianity with the characteristic function : the i.i.d. case. Signal Processing, 53(2), 110–120. M. Sahmoudi © Processus Alpha-Stables pour la Séparation et l’Estimation Robustes des Signaux non-Gaussiens et/ou non-Stationnaires