Wi-Fi Activity in Open Environments

Transcription

Wi-Fi Activity in Open Environments
Thèse de doctorat
U NIVERSIT É P IERRE ET M ARIE C URIE
École doctorale
I NFORMATIQUE , T ÉL ÉCOMMUNICATIONS ET É LECTRONIQUE
présentée par
Thomas Claveirole
pour obtenir le grade de
Docteur de l’Université Pierre et Marie Curie
Activité Wi-Fi en environnement
ouvert : outils, mesures et analyses
à soutenir le 26 février 2010 devant le jury composé de :
Ana Cavalli
Rapporteur
Prof. TELECOM & Management SudParis
Thierry Turletti
Rapporteur
Chargé de recherche INRIA
Khaldoun Al Agha
Examinateur
Prof. Université Paris-Sud 11
Guillaume Chelius
Examinateur
Chargé de recherche INRIA
Marcelo Dias de Amorim
Co-encadrant
Chargé de recherche CNRS
Serge Fdida
Co-encadrant
Prof. Université Pierre et Marie Curie
Numéro bibliothèque :
PhD Thesis
U NIVERSITY P IERRE AND M ARIE C URIE
Doctoral school
C OMPUTER S CIENCE , T ELECOMMUNICATIONS , AND
E LECTRONICS
presented by
Thomas Claveirole
submitted for the degree of
Doctor of Science of the University Pierre and Marie Curie
Wi-Fi Activity in Open Environments:
Tools, Measurements, and Analyses
Commitee in charge:
Ana Cavalli
Reviewer
Prof. TELECOM & Management SudParis
Thierry Turletti
Reviewer
INRIA researcher
Khaldoun Al Agha
Examiner
Prof. University of Paris-Sud 11
Guillaume Chelius
Examiner
INRIA researcher
Marcelo Dias de Amorim
Co-advisor
CNRS researcher
Serge Fdida
Co-advisor
Prof. University Pierre and Marie Curie
Remerciements
J
E
tiens tout d’abord à remercier mes rapporteurs et mon jury. Des chercheurs confirmés
qui me consacrent du temps malgré des emplois du temps chargés, alors que rien ne les
y oblige ; cela m’a toujours paru un peu saugrenu. Je ne peux donc que témoigner ma gratitude. Merci à Ana Cavalli, Thierry Turletti, et Guillaume Chélius. Je me permet de remercier
Khaldoun Al Agha à part, spécialement, parce que les réseaux, c’est lui qui m’a jeté dedans,
il y a cinq ans. Également, impossible d’oublier Serge Fdida et Marcelo Dias de Amorim.
Merci pour l’encadrement, pour l’accueil, pour tout ce que vous m’avez apporté.
Marcelo, particulièrement, je dois te dire quelque chose. Plusieurs fois, je venais te voir
avec le moral à zéro. Je suis dans une impasse, rien ne va marcher comme prévu, c’est sûr.
Et toujours, tu discutes, et me voila gonflé à bloc, motivé comme jamais. Je ne sais pas comment tu fais. Je n’ai pas compris ton secret. Mais je souhaite ne jamais travailler qu’avec des
personnes aussi motivantes que toi.
Je remercie aussi Mathias Boc. Sans notre collaboration, cette thèse serait un peu moins
complète. Et puis, reconnaissons le, je lui dois la moitié d’un voyage à San Francisco. . .
Il y a aussi des personnes un peu spéciales, que je voudrais absolument remercier. Ce
sont mes amis de mon ancienne école, l’ÉPITA. Entre autres le laboratoire de recherche et de
développement (Akim, Théo, Raph., tant d’autres). Mais aussi Claire, Max., Alexandre, tous
les (ex-)élèves-assistants. Merci. Bien avant ma thèse j’ai partagé beaucoup avec vous. Pas
seulement du travail, mais beaucoup de travail tout de même. Et cela a modifié ma façon
d’appréhender l’informatique. Au risque de passer pour fou, je le dis : je pense que vos
personnalités suintent de mes recherches, de mes lignes de code, de mes articles.
Mais j’ai beaucoup d’autres amis à remercier. Il m’aura fallu du temps pour bien mesurer
ce que ma thèse leur doit. Il aura également fallu qu’un quiproquo sur le sujet m’oppose
à l’un d’eux. Par erreur, presque par hasard, au moment même où j’écris ces lignes. Merci
donc Yosra, Brice, Salim, Amélie, Mathias, Cédric. Merci P.-E., Thomas, Clémence, Matthieu,
Anneli. Merci Magali, merci Sophia, merci Élodie. Merci Pierre. Ça fait beaucoup de mercis,
et beaucoup de gens, c’est vrai. Et pourtant, je suis sûr que j’en oublie. Plein. Et pourtant, ils
ont tous indirectement participé à ma thèse, pas toujours consciemment. Merci.
Enfin, je termine avec une pensée pour ma famille. Et notamment pour mes grandsparents : ils m’ont déjà dit combien ils seront fiers d’avoir un petit-fils docteur.
i
ii
Résumé
Depuis environ dix ans, le Wi-Fi rencontre un énorme succès. En conséquence, une partie importante de la recherche sur les réseaux consiste a mesurer son protocole sous-jacent,
IEEE 802.11, afin de mieux le comprendre. Le sniffing est l’une des techniques utiles à cette
compréhension. Elle consiste a déployer des moniteurs au sein d’une zone de mesure, qui
enregistrent tout le trafic pouvant être entendu. C’est une technique passive, et chaque moniteur produit des traces de paquets. C’est également une opération fondamentale pour un
certain nombre d’opérations, dont le diagnostique de problèmes réseau, l’amélioration de la
sécurité, et l’analyse du comportement de certains protocoles.
Les travaux existants qui se basent sur le sniffing soulèvent un certain nombres de questions. Alors que cette technique repose essentiellement sur la manipulation de traces de paquets IEEE 802.11, il n’existe pas de boı̂te à outil logicielle générique pour effectuer ces manipulations. En conséquence, des efforts sont dupliqués, certains outils sont trop spécifiques,
l’interopérabilité est parfois mauvaise, et les performances pas toujours au rendez-vous.
C’est particulièrement vrai dans le cas de la fusion de traces. Alors qu’il s’agit d’une étape
commune à plusieurs études, peu d’outils existent, dont l’utilisabilité est limitée. En dehors
de ces problèmes prosaı̈ques il existe aussi des questions de plus haut niveau. D’abord, il
existe une incertitude sur la précision que l’on peut attendre des moniteurs. Ensuite, la plupart des études se concentrent sur les caractéristiques de bas niveau de IEEE 802.11. Dans
la mesure ou ce protocole est présent aujourd’hui sur des nouvelles catégories d’appareils,
notamment des appareils mobiles, il serait également intéressant d’étudier les habitudes
de ses usages plutôt que les problèmes de protocole. Enfin, la plupart des expériences se
concentrent sur des environnements académiques (universités, laboratoires, conférences).
Il est vraisemblable d’imaginer que d’autres environnements offrent des caractéristiques
différentes.
Au sein de cette thèse, nous proposons WiPal, un ensemble logiciel pour traiter les traces
de paquets IEEE 802.11, et nous l’utilisons pour résoudre les problèmes précédemment décris.
WiPal inclue une bibliothèque générique pour manipuler les traces de paquets et les trames
IEEE 802.11. Il fournit également un ensemble de programmes au dessus de cette bibliothèque.
Ceux-ci permettent d’effectuer des opérations diverses (par exemple concaténation ou comparaison), d’extraire des statistiques, de rendre des traces anonymes, ou encore, de fusionner des traces. Afin de rendre WiPal générique et efficace, nous avons développés plusieurs
iii
algorithmes spécifiques, ainsi que des optimisations pour pouvoir traiter efficacement de
grands jeux de données.
Grâce à l’utilisation de WiPal, nous effectuons plusieurs analyses dans différents environnements. En analysant deux jeux de données de courtes durées nous améliorons notre
compréhension de la précision du sniffing. Ensuite, en analysant trois jeux de données de
longues durées (plusieurs jours) dans des environnements différents nous obtenons une
meilleure compréhension des comportements journaliers des utilisateurs vis à vis des réseaux
sans-fils. Ces environnements possèdent des caractéristiques sociales différentes : un espace
de bureaux, une zone pavillonnaire, et une zone résidentielle urbaine dense. Nos résultats
dévoilent des propriétés nouvelles et inattendues. Par exemple, nous montrons que les techniques usuelles de mesure de précision des traces ne sont pas aussi fiables que prévu. Ou
encore, que les traces de longues durées contiennent une très faible proportions d’utilisateurs réguliers.
Mots-clefs
Mesure, Wi-Fi, IEEE 802.11, sniffing, trace de paquets, fusion de traces.
iv
Abstract
For about a dozen years Wi-Fi has known a tremendous success. Consequently, a large
part of networking research has focused on measuring and understanding its corresponding protocol, IEEE 802.11. Among the techniques that proved to be useful to this research is
wireless sniffing. Such a passive measurement technique consists in spreading within some
target area a number of monitors that capture all wireless traffic they hear to produce packet
traces. It is a fundamental step in a number of network operations, including network diagnosis, security enhancement, and behavioral analysis of protocols.
Existing work based on wireless sniffing raises however a number of issues. Despite
IEEE 802.11 packet trace manipulations are fundamental to this technique, no generic framework exists to carry them. This results in duplicated efforts among scientists, overspecialized tools, bad interoperability, and sometimes sub-optimal performance. This is especially
true for trace merging. Though being a common step in many studies, only a few tools exist
to merge Wi-Fi traces, and they have limited usability. Beyond these “prosaic” problems
there are also more challenging questions. First, there is a lack of insights into the accuracy
one can expect from wireless sniffers. Second, most studies focus on low level characteristics of IEEE 802.11. As Wi-Fi now equips new categories of mobile devices, studying usage
patterns instead of protocol issues becomes also interesting. Finally, most experiments collect traces in “academic” environments (university campuses, laboratories, or conference
venues). It is likely that other environments would display different properties.
In this thesis we propose WiPal, a framework to process IEEE 802.11 packet traces, and
use it to tackle the aforementioned issues. WiPal includes a generic library to handle packet
traces and IEEE 802.11 frames. It also provides a set of programs atop this library. These
programs feature miscellaneous operations such as concatenation or comparison, statistics
extraction, trace anonymization, and, most notably, trace merging. We developed a number
of specific algorithms and optimizations in order to make WiPal a generic tool able to cope
efficiently with large datasets.
By using WiPal, we perform a number of analysis on traces we collected in various environments. Through the analysis of two short-lived datasets using up to eight monitors,
we extend our understanding on the accuracy of Wi-Fi traces. Then, through the analysis of
three long-lived datasets (several days), we obtain a better understanding of people’s daily
behaviors with respect to the underlying wireless network. These environments present dif-
v
ferent sociological means: an office area, a sparse residential area, and a dense residential
area. Our results reveal unseen and unexpected properties. For instance, traditional techniques to estimate trace accuracy are much less reliable than previously thought, or regular
users count for a very small portion of the total population in long-lived traces.
Keywords
Measurement, Wi-Fi, IEEE 802.11, sniffing, packet trace, trace merging.
vi
Contents
Remerciements
i
Résumé
iii
Abstract
v
Contents
vii
1 Introduction
1.1 Context: Wi-Fi measurements . . . . . . . . . . .
1.2 Issues with Wi-Fi sniffing and related techniques
1.3 Contributions of this thesis . . . . . . . . . . . .
1.3.1 WiPal: manipulating IEEE 802.11 traces .
1.3.2 Applying WiPal: empirical analyses . . .
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
I The WiPal trace manipulation framework
2 WiPal: overview and design
2.1 Trace manipulation tools: related work .
2.2 Overview of WiPal . . . . . . . . . . . .
2.2.1 Features . . . . . . . . . . . . . .
2.2.2 Overall architecture . . . . . . .
2.3 Packet parsing . . . . . . . . . . . . . . .
2.3.1 PHY headers . . . . . . . . . . .
2.3.2 IEEE 802.11 parsing . . . . . . .
2.4 Filters . . . . . . . . . . . . . . . . . . . .
2.4.1 Filter sources: pcap abstractions
2.4.2 Processing filters . . . . . . . . .
2.5 Performance evaluation . . . . . . . . .
2.5.1 Methodology . . . . . . . . . . .
2.5.2 Results . . . . . . . . . . . . . . .
2.6 Conclusion . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 WiPal: IEEE 802.11 trace merging
3.1 Trace merging: state of the art . . . . . . . .
3.2 WiPal’s basics . . . . . . . . . . . . . . . . .
3.3 Detailed operation of WiPal’s trace merging
3.3.1 Identifying reference frames . . . . .
3.3.2 Extraction of unique frames . . . . .
vii
1
1
3
4
4
5
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
10
11
12
13
15
17
17
19
22
22
23
25
.
.
.
.
.
27
28
29
30
31
32
Contents
viii
3.4
3.5
3.3.3 Intersection . . .
3.3.4 Synchronization .
3.3.5 Merging . . . . .
Evaluation . . . . . . . .
3.4.1 Correctness . . .
3.4.2 Efficiency . . . .
Conclusion . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
34
36
36
36
38
39
II Applying WiPal: empirical analyses
41
4 Accuracy of wireless packet sniffing
4.1 Completeness evaluation: state of the art
4.2 Datasets . . . . . . . . . . . . . . . . . . .
4.2.1 Overview . . . . . . . . . . . . . .
4.2.2 Preliminary analysis . . . . . . . .
4.3 Completeness evaluation: shortcomings .
4.4 Completeness and number of sniffers . .
4.4.1 Methodology . . . . . . . . . . . .
4.4.2 Results . . . . . . . . . . . . . . . .
4.5 Conclusion . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
45
45
46
48
49
49
51
53
5 Empirical analysis of Wi-Fi activity in three urban scenarios
5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Device diversity . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Cumulated activity durations . . . . . . . . . . . .
5.2.2 Growth of the number of devices . . . . . . . . . .
5.3 Activity/Mobility Behaviors . . . . . . . . . . . . . . . . .
5.3.1 Inter-activity patterns . . . . . . . . . . . . . . . .
5.3.2 Predominant activity pattern . . . . . . . . . . . .
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
56
57
58
59
61
61
63
64
.
.
.
.
67
67
68
68
69
6 Conclusion and future work
6.1 WiPal . . . . . . . . . . .
6.2 Wi-Fi sniffing accuracy .
6.3 Wi-Fi activity . . . . . .
6.4 Perspectives . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Appendices
71
A Résumé de la thèse en français
A.1 Contexte . . . . . . . . . . . . . . . . . . . . . . . . .
A.1.1 Mesures passives Wi-Fi et sniffing . . . . . . .
A.1.2 Questions ouvertes . . . . . . . . . . . . . . .
A.2 Contributions de cette thèse . . . . . . . . . . . . . .
A.2.1 WiPal : manipulation de traces IEEE 802.11 .
A.2.2 Applications de WiPal : analyses empiriques
A.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
73
73
74
75
76
77
81
88
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
ix
B WiPal manual
B.1 The programs . . . . . . . . . . . . . . . . . . . .
B.1.1 Invocation . . . . . . . . . . . . . . . . . .
B.1.2 Concatenation (and Prism noise filtering)
B.1.3 Comparisons . . . . . . . . . . . . . . . .
B.1.4 Sub-traces . . . . . . . . . . . . . . . . . .
B.1.5 Merging . . . . . . . . . . . . . . . . . . .
B.1.6 Synchronization . . . . . . . . . . . . . . .
B.1.7 Unique frames . . . . . . . . . . . . . . .
B.1.8 Duplicate data frames . . . . . . . . . . .
B.1.9 Statistics . . . . . . . . . . . . . . . . . . .
B.1.10 Anonymization . . . . . . . . . . . . . . .
B.1.11 Miscellaneous programs . . . . . . . . . .
B.1.12 Undocumented programs . . . . . . . . .
B.2 The library . . . . . . . . . . . . . . . . . . . . . .
B.3 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
91
91
96
96
96
97
99
100
102
102
107
107
108
108
109
C List of publications
C.1 Journals . . . . . . .
C.2 Conferences . . . .
C.3 Demos and posters
C.4 Software . . . . . .
C.5 Under review . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
115
115
115
116
116
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
117
List of Figures
123
List of Tables
125
Listings
127
x
Contents
Chapter 1
Introduction
T
HE
IEEE 802.11 standard [37] defines base layers for wireless communications. It ap-
peared about a dozen years ago, using the trademark Wi-Fi, and is widely used today.
Personal computers featuring IP communications over wireless links rely almost exclusively
on this protocol. Furthermore, Wi-Fi also plays a major role on other wireless-capable mobile devices: it is available on most PDA’s, smart phones, portable music players, even some
digital cameras.
As a consequence, Wi-Fi is part of the landscape of ubiquitous computing [58] . Along with
other wireless protocols, such as Bluetooth or GSM, it is involved in creating a transparent
digital environment for everyday life. For instance, Wi-Fi access points provide Internet
access (hotspots) in households, hotels, conferences, and many other places. Understanding
how IEEE 802.11 implementations behave in the wild, and what are the usage patterns of its
users, is therefore essential. This insight is necessary for developing new applications and
protocols, or improving existing ones.
1.1 Context: Wi-Fi measurements
IEEE 802.11 specifies a physical layer (PHY) and a medium access control scheme (MAC)
for wireless networks. The PHY is in charge of encoding and decoding digital information
(bit sequences) to and from radio wave signals. The MAC, on the other hand, schedules
transmissions so devices can share the medium and do not interfere with each other.
Despite mostly an industry-pushed standard, computer scientists have produced a wide
amount of research concerning IEEE 802.11. This includes specialized topics such as studying its PHY [30;45] , its MAC [46] , and other features such as security [12;14] . More generic research topics also involves this protocol: ad hoc and mesh networking [10;27] , sensor networks [60] , or pervasive computing [58] . A proper understanding of IEEE 802.11 can therefore
benefit all these topics. To achieve this understanding one needs both theoretical analyses
1
1.1. Context: Wi-Fi measurements
2
Figure 1.1: Wireless sniffing: passive monitors listen to the wireless activity inside the measurement area.
and practical experiments. This thesis focuses on experiments, and most specifically measurements of wireless networks in the wild.
Every network measurement is either active or passive. Active measurements alter network traffic so they can evaluate various parameters. Classic active techniques include saturating a link to measure maximum throughput, or sending probes back and forth to evaluate round trip delays. On the opposite, passive measurements do not interfere with network
traffic. This occurs, for instance, when taping a network link to analyze packet flows. Note
that passive techniques still might interfere with the infrastructure: they might require users
to embed a specific software, or administrators to plug specific tapping equipment.
A common passive technique to measure wireless networks is sniffing. Wireless sniffing consists in spreading within some target area a number of monitors (or sniffers) that
capture all wireless traffic they hear (see Figure 1.1). Sniffers produce traces composed of a
succession of MAC frames. Wireless sniffing is a fundamental step in a number of network
operations, including network diagnosis [23;34] , security enhancement [12;48] , and behavioral
analysis of protocols [22;39;43;59] . Although not mandatory, it is also possible to use wireless
sniffing to support some location systems [20;21;61] . It comes in a variety of flavors: there
might be only one or several sniffers, sniffers may be commodity hardware or specialized
devices, they can operate offline or be part of a wired infrastructure (among other parameters). In any cases, the sniffing operation is passive and does not interfere with the network’s
normal operation.
Wireless sniffing often involves a centralized process that is responsible for merging the
traces [22;43;59] . The objective is to have a global view of the wireless activity from multiple
local measurements. By providing overlapping coverage zones, it is also possible to compensate for frame losses with data from different sniffers. Merging is however a difficult
Chapter 1. Introduction
3
task; it requires precise synchronization among traces (up to a few microseconds) and bearing the unreliable nature of the medium (frame loss is unavoidable).
1.2 Issues with Wi-Fi sniffing and related techniques
There are still, however, a number of issues with Wi-Fi sniffing. This thesis focuses on technical issues.1 We categorize them in two classes: issues with the technique itself and issues
with the tools. This thesis addresses both, in an effort to collect new datasets and produce
original analyses.
Issues with the technique relates to the relevance of the produced traces. This includes
sniffer accuracy. Even in good radio conditions, sniffers may miss successfully transmitted
frames. In this context, a natural question arise: each sniffer trace being incomplete (i.e.,
lacking some frames), it is likely that a merged trace be incomplete as well. What is the
accuracy one can expect from a single sniffer? From multiple sniffers? What results can be
drawn from incomplete traces?
Another issue regarding the relevance of traces concerns the available datasets. Despite
Wi-Fi is almost ubiquitous, most of the datasets made available by the research community
are about university campuses, laboratories, or conference venues [2] . This is partly due to
current practices focusing on easy-to-access environments for researchers, but also to the
fact that existing monitoring techniques only fit specific scenarios. Most of the techniques
available in the literature either focus on one single network, or require setting up a whole
infrastructure, or need intrusive access to one’s network. Once down the street or inside an
individual’s house, such techniques are therefore difficult to implement. Wireless sniffing
however has a strong potential for monitoring all kinds of environments: it is passive, it
does not interfere with monitored networks’ infrastructures, and in some cases it does not
even need to rely on any infrastructure at all. But this potential has remained unexploited so
far. A consequence is most researchers restrict sniffing to studying protocol quirks [22;39;43] .
We think however that sniffing could be a great tool to focus on wireless network usage in
usually hard-to-reach environments (e.g., individual houses, streets, or parks).
Issues with tools mostly relate to the manipulation of packet traces. Many network operations involve these traces: administrators use them for monitoring or troubleshooting,
researchers use them for measurements, simulations, or validations. Wireless sniffers produce packet traces consisting of lists of MAC frames. Many tools exist for their creation and
manipulation, but most of these are designed for a very specific goal, and carry their own
packet-processing code. For instance, tcpdump [8] understands many network protocols, but
its parsing code may not be used for other purpose than displaying packets on a terminal. As
another exmample, Wireshark [6] is more generic, but it is still mostly visualization-oriented
1 Some
non-technical issues also exist, for instance legal and ethical issues [11;55]
1.3. Contributions of this thesis
4
and suffers from similar issues. Most packet processing programs have a good design and
are very efficient regarding their focus, but each time one creates another tool to handle
packet traces, it is impractical to rely on previous code. Furthermore, some tools suffer from
performance issues (for instance, Scapy [5] is a powerful tool to analyze packet traces but is
not tractable on large traces — 1 GB or more). All of this makes carrying custom analyses
on sniffer traces a fastidious process. It often requires developing new tools from scratch.
For the same reasons, merging IEEE 802.11 traces is also an issue. The literature has provided the community with a few merging tools, but most of them require a wired infrastructure [22;28] . The others are too specific to the experimentations conducted in the papers [43;44] .
In order to make Wi-Fi sniffing generalizable to any environment, one needs both generic
tools and tools that do not expect a wired infrastructure to be available.
1.3 Contributions of this thesis
This thesis’ contributions are twofold. First, we develop a framework, called WiPal, to help
processing IEEE 802.11 packet traces. This framework includes a generic library to help developing new tools, and several hands-on utilities to perform predefined operations on trace
files. These utilities include a trace merger with innovative features. Second, we perform
two analyses using these tools. In order to carry these analyses we collect several datasets in
various environments, including day-long traces from a residential area and an uptown location. The first analysis focuses on the accuracy of Wi-Fi sniffing, while the second studies
Wi-Fi usage patterns.
1.3.1 WiPal: manipulating IEEE 802.11 traces
The first part of this thesis presents WiPal, our packet trace manipulation framework. WiPal
was designed for performance, without any specific application in mind, but rather in the
hope that others could rely on it to develop custom trace processing software. Though it
focuses on handling the IEEE 802.11 protocol, it provides several protocol-agnostic features.
What renders WiPal interesting is its original design, and some novel features it has. In this
thesis:
• We present generic patterns for handling various types of packet traces. For instance,
using a pipe and filter mechanism to process packet traces, or using a static callback
mechanism to generate both efficient and generic frame parsers.
• We present how some novel features might benefit to packet processing programs, and
how to implement them. For instance, random access to a packet trace, or the ability
to consider the aggregation of several files as one unique packet stream.
Chapter 1. Introduction
5
• We raise a number of issues a program designer might encounter when writing packet
trace processing software. We discuss existing practices to solve them and the specific
solutions adopted by WiPal (how and why).
• We evaluate the performance of WiPal and compare it with other tools. The results
show that WiPal’s generic design does not impact its performances regarding execution speed: it can compete with specialized code. Also, some new features do not
impact performance, and others, which are optional, only imply a limited overhead.
A distinctive component of WiPal is its merging tool. This tool works offline and is able
to merge IEEE 802.11 packet traces. Its key features are performance, ease-of-use, and flexibility. As a consequence, its design do not assume features from traces that would require
monitors to access a network infrastructure (e.g., some tools require network synchronization [22] ). It also supports most of the existing input formats (e.g., raw IEEE 802.11 frames,
Prism, Radiotap, and AVS headers). Finally, it is usable in a straightforward fashion by just
calling the adequate programs on trace files (other mergers require more complex setups,
generally involving various servers [22;28;43] ). This thesis motivates and describes WiPal’s
trace merger design:
• It proposes new algorithms for various stages of the merging process. In particular,
the synchronization algorithm is a generalization of previous algorithms from the literature.
• It provides an analysis of the synchronization algorithm; we show that our algorithm
is more accurate than previous algorithms.
• It provides a performance study that shows WiPal’s merger is an order of magnitude
faster than the other publicly available offline merger, namely Wit [43] .
Our analyses rely on sixteen real traces from four distinct datasets (CRAWDAD’s uw/sigcomm2004 [50] , recorded during the SIGCOMM 2004 conference, and three private datasets
we collected in various conditions). They allow us to calibrate various parameters of WiPal,
validate its trace merger’s operation, and show its efficiency. We believe that WiPal will be
of great utility for the research community working on wireless network measurements.
1.3.2 Applying WiPal: empirical analyses
WiPal enables us to carry analyses on datasets we collect using Wi-Fi sniffing. The second
part of this thesis presents two of these analyses. The first one focuses on the accuracy of
Wi-Fi sniffing. The second one studies Wi-Fi usage patterns in environments with different
sociological meanings.
First, we collect short-lived (up to two hours) datasets using up to eight monitors sharing
the same location. Analyzing these traces reveals that existing techniques to evaluate trace
1.4. Outline
6
completeness are inaccurate. Among other issues, we observe that a single buggy device
can be responsible for blundering the whole system. Second, we investigate how the number of sniffers impacts trace completeness. We show that even though individual sniffers
may provide good accuracy, sometimes using eight sniffers is still not enough to capture all
frames. Furthermore, the sniffing process exhibits a high level of randomness with variable
accuracy.
Second, we record and analyze long-lived traces (three-day long and ten-day long) obtained in three environments: an office, a dense uptown residential area, and a sparse suburban residential area. We focus on the behavior of the devices rather than on the traffic
characteristics. We are interested in observations like the total duration a device is active,
the frequency of appearance of new devices, and activity that can be extracted from traces.
Among a number of results, we show that: (i) independently of the trace, most devices are
inactive most of the time, (ii) due to mobility, two traces have a constant discovery rate of
new users, even after days of measurements, and (iii) as the environments are part of users’
life along a typical day, activity intensity alternates between residential and office areas.
1.4 Outline
For the sake of clarity, we decided to split the description of related works: each chapter
includes the part related to its concerns. There is four chapters divided in two parts: the first
part focus on WiPal while the second part presents empirical studies.
The first part features chapters 2 and 3. Chapter 2 gives a general overview of WiPal.
It presents its original features and design, and also include a performance evaluation of
this design. Chapter 3 focus on WiPal’s trace merging process. It presents some algorithms
and optimizations that WiPal uses for this process. This includes an evaluation of WiPal
performance with regard to trace merging.
The second part presents the empirical analyses we performed using WiPal. It features
Chapters 4 and 5. Chapter 4 is a study of Wi-Fi sniffing accuracy. Chapter 5 studies Wi-Fi
usage patterns in various environments.
Finally, appendices include WiPal’s manual and a list of references.
Part I
The WiPal trace manipulation
framework
7
Chapter 2
WiPal: overview and design
W
I PAL
emerged from our needs to manipulate Wi-Fi packet traces. At its creation,
existing tools lacked features regarding some operations (e.g., merging or extract-
ing statistics). We therefore developped WiPal to fulfill these needs. WiPal has a number of
features, but its most significant one is certainly trace merging, and it includes some original algorithms to this regard.1 This chapter reports on our experience designing the WiPal
framework: we draw attention on WiPal’s design, and some of its features, which are original and might benefit other software developers faced with similar issues. WiPal is free
software, available at http://wipal.lip6.fr/. Appendix B includes WiPal’s manual.
In the following, Section 2.1 first gives a short overview of existing software for packet
traces manipulations. Then Section 2.2 gives an overview of WiPal’s design and features.
The two subsequent sections focus on WiPal’s modules; Section 2.3, addresses packet parsing, while Section 2.4 describes WiPal’s pipe and filter mechanisms. Eventually, Section 2.5
evaluates WiPal’s performance.
2.1 Trace manipulation tools: related work
Packet traces are lists of network packets, either synthetic or, more commonly, acquired by
tapping a network medium. They are involved in a significant part of network operations:
administrators use them for monitoring and troubleshooting, researchers use them in measurements, simulations, or validations. As a consequence, many tools exist for their creation
and manipulation (this includes, for instance, visualization or filtering) [29] . A common format is also prevalent for packet trace operations: pcap (packet capture) [7] .
However, most packet trace processing tools are designed for a very specific goal, and
carry their own packet-processing code. For instance, tcpdump understands many network
protocols, but its parsing code may not be used for other purpose than displaying packets
on a terminal [8] . Many tools rely on libpcap, but this library focuses on capturing packets
1 Chapter
3 details WiPal’s merging algorithms.
9
2.2. Overview of WiPal
10
and does not export parsing or processing capabilities [7] . Wireshark is more modular, but
still mostly visualization-oriented [6] . All these programs have a good design, and they are
very efficient regarding their focus, but each time one creates another tool to handle packet
traces, it is impractical to rely on previous code. Scapy is a notable exception [5] . It is an
interactive packet manipulation program written in Python, able to parse many network
protocols, providing features to read and write trace files, or interact with the network.
Nevertheless, the scripting-nature of Scapy makes it dedicated to prototyping, experiments,
short programs, or at least programs where performance is not an issue. Scapy’s features
also stop at the packet level; they do not provide trace-processing algorithms. Another
interresting software is binpac [47] . binpac is a parser generator similar to Yacc [40] but it
focuses on network protocols. binpac is efficient but only handle unicast streams. It is
therefore not suited for sniffer traces.
We designed WiPal to solve these issues regarding reusability and performance. Although WiPal focuses on Wi-Fi traces, it provides several protocol-agnostic features.
2.2 Overview of WiPal
At the very beginning, WiPal started as a limited C++ library to parse IEEE 802.11 frames.
The library then grew with the applications using it. Due to our focus on reusability, we designed these applications as shells around WiPal’s features. Each new feature was therefore
part of WiPal rather than of a specific program. Eventually, applications relied so much on
WiPal and had so few features of their own that they were merged into WiPal. Our need for
solutions regarding all aspects of packet trace processing made WiPal a consistent generic
library rather than a patchwork of specific features.
Another critical aspect of WiPal is performance. In the literature, some libraries exist
to help users parse various network protocols but they are only available for scripting languages (e.g., Scapy [5] ). While this is especially well-suited for quick prototyping and experimentation, this is intractable for handling large traces (especially for heavy computations).
WiPal’s most complex utilities can process gigabyte-long traces in minutes. WiPal uses C++
because of this combined importance of reusability, performance, and genericity.
2.2.1 Features
WiPal comes both as a library and a set of binaries (programs). Binaries provide a quick
and simple interface for high-level features, but these features are also available as library
services. Low-level features though are only available through the library. As an example, one does not need to write a program for merging several trace files. The following
command does the job:
$ wipal-merge t1.pcap t2.pcap [t3.pcap...]
Chapter 2. WiPal: overview and design
1
#include <wipal/pcap/stream.hh>
2
#include <wipal/wifi/frame.hh>
11
3
4
using namespace wpl;
5
6
int main()
7
{
pcap::file<> f ("file.pcap");
8
9
for (pcap::file<>::iterator i = f.begin(); i != f.end(); ++i)
10
std::cout << wifi::type::names[wifi::type_of(i->bytes())] << std::endl;
11
12
}
Listing 2.1: A sample program using WiPal.
This program prints the type of every
IEEE 802.11 frame included in file.pcap.
High level features include trace synchronization (using the wipal-synchronize binary),
trace merging (using wipal-merge), statistics extraction (wipal-stats), trace anonymization
(wipal-anonymize), and various minor utilities such as comparison, concatenation, or hexadecimal dumping (wipal-cmp or wipal-cat, to name a few) The most important low-level
features are pcap file I/O, IEEE 802.11 parsing, and support for other IEEE 802.11 related
protocols.
Note that wipal-merge’s code is just a shell around the library features. As of this writing, WiPal binaries have an average length of 122 lines of C++ (the whole WiPal, including
the library, is about 20k lines of code). The smallest binary is 44 lines of code, and the biggest
267. Most of this code is boilerplate due to specific C++ programming techniques.
On the other hand, performing a specific task using WiPal’s parser, or combining several
treatments in one executable file, requires users to write their own programs using the WiPal library. Listing 2.1 shows a sample program using this library. This program has few
features, but other snippets will extend it in the following sections.
2.2.2 Overall architecture
Figure 2.1 presents a simplification of WiPal’s structure. Binaries (on top) rely on the library,
and the library itself relies on other external libraries. The WiPal library is also composed of
several modules. We classify each module into one of base, protocols and file formats, or filters.
Base. These modules provide simple and common features, unrelated with WiPal’s specific domain. For instance, they include various exceptions for error handling, generic abstract classes, and static programming helpers. We kept this layer as thin as possible thanks
to external libraries such as Boost [1] or GNU MP [3] .
2.3. Packet parsing
12
Figure 2.1: WiPal’s structure and modules.
Protocols and file formats.
These modules are domain specific and provide the base to
high level processing. They feature abstractions such as IEEE 802 addresses, pcap traces,
and protocol headers.
Filters. One may view a packet trace as a packet stream. Most algorithms just read this
stream linearly, each packet after another, from its beginning to its end. This mode of operation particularly suits using a pipe and filter pattern [17] . Therefore, we base WiPal’s
high level modules on this design. WiPal provides pipe input and output through iterators [32] . The instantiation of a filter object requires one or several iterators as input, and
each object provides an output iterator. For instance, an anonymizer filter reads packets and
outputs anonymized packets. A merge filter requires two packet streams as input but produces one output stream. Some processings need adaptation to this pattern. For instance,
simultaneously synchronizing and merging two IEEE 802.11 packet traces is a complex operation [22;44;59] . Implementing it in WiPal means decomposing it into several filters and then
using a specific wiring for these filters (see Figure 2.2). Every algorithm that needs to access
a packet trace non-linearly needs such an adaptation.
The base modules, as well as some of the protocols and file formats modules form the
lower modules of the WiPal library. On the other hand, filters form the higher modules.
2.3 Packet parsing
Although network packets often use a binary format, parsing them is not always straightforward. This is the case even when considering only Wi-Fi packet traces. Furthermore,
Chapter 2. WiPal: overview and design
13
Figure 2.2: A complex filter example. This figure shows how WiPal uses filters to synchronize and merge two IEEE 802.11 traces. Each box represents a filter and arrows show pipes.
Pipes convey different types of data.
distinct traces may involve distinct formats in addition to IEEE 802.11 (see Section 2.3.1 below). IEEE 802.11 packets may have several types and subtypes, and each type/subtype
pair yields a distinct format (although all formats share some similarities). A well-crafted
program should handle as many formats as possible, and handle each field properly according to its type. Implementing a new format should not need modifying existing code. It
should be possible to perform various processing on the same frame without modifying the
frame parser. In the following, we describe WiPal’s mechanisms that enable users writing
such programs.
2.3.1 PHY headers
IEEE 802.11 packet traces often include extra information about the physical parameters
of transmissions (e.g., frequency, signal-to-interference ratio, or precise timestamp). This
information is available as an extra packet header inserted by the operating system for each
frame. We call this header a PHY header. A pcap file is a succession of chunks, each chunk
containing a pcap header and a byte sequence corresponding to a packet. PHY headers are
located at the beginning of this byte sequence, between the pcap header and the IEEE 802.11
header. Inside packet traces that do not include PHY headers, an IEEE 802.11 header appears
directly after each pcap header.
There is no reference format for PHY headers: hardware vendors introduced them independently of any standardization process. Open source developers push the Radiotap
format [4] as a de facto standard, but many traces are already available in other formats (e.g.,
AVS or Prism), and some network drivers do not support Radiotap. Furthermore, some
2.3. Packet parsing
14
6
template <class PHY>
7
void
8
print(pcap::file<>& f)
9
{
for (pcap::file<>::iterator i = f.begin(); i != f.end(); ++i)
10
{
11
12
const PHY*
13
const void* ieee80211 = phy->decapsulate(i->meta().caplen, i->swapped());
phy = static_cast<const PHY*> (i->bytes());
14
std::cout << wifi::type::names[wifi::type_of(ieee80211)] << std::endl;
15
}
16
17
}
18
19
int main()
20
{
pcap::file<> f ("file.pcap");
21
22
switch (f.type())
23
24
{
25
case pkt::IEEE802_11:
print<phy::empty_header<> >(f);
break;
26
case pkt::IEEE802_11_RADIO:
print<rtap::header>(f);
break;
27
case pkt::IEEE802_11_RADIO_AVS:
print<avs::header>(f);
break;
28
case pkt::PRISM_HEADER:
print<prism::header>(f);
break;
29
}
30
}
Listing 2.2: The program of listing 2.1 with support for multiple PHY headers.
developers are reluctant to use it due to its variable-length headers. As a consequence, interoperability between IEEE 802.11 tools and packet traces is problematic. Most researchers
develop their tools for the specific PHY headers they use, and different tools might not be
able to process the same traces. Sometimes the features provided by two PHY formats are
not even compatible!
WiPal solves this issue using a proper abstraction for PHY headers (see listing 2.2). WiPal
users can handle any PHY header using the same consistent API (Application Programming
Interface), as shown in lines 12–13 of listing 2.2. Note that users need to test the format of the
trace file to setup the proper PHY header type. This is the purpose of the switch statement
line 23. The reason is that each PHY header’s C++ type is part of a static class hierarchy [16] ,
thus no dynamic method resolutions are possible. We wanted to avoid dynamic resolutions
because a trace file may contain several hundred million packets, and we wanted to minimize the number of dynamic method calls for each packet (for the sake of performance).
WiPal binaries factorize case statements and avoid this redundancy using the Boost preprocessor library [1] .
Chapter 2. WiPal: overview and design
1
const uint8_t*
offset = static_cast<const uint8_t*>(frame) + 30;
2
uint16_t
eth_type = tool::extract_big_endian_short_u(offset);
3
uint8_t
ip6nxthdr = offset[8];
4
uint8_t
icmp6type = offset[42];
5
uint16_t
udp6port = tool::extract_big_endian_short_u(offset + 42);
15
6
7
8
9
10
if(eth_type == 0x86dd and ip6nxthdr == 0x11 and udp6port == 698)
// ...
else if(eth_type == 0x86dd and ip6nxthdr == 0x3a and icmp6type == 0x86)
// ...
Listing 2.3: A typical example of packet processing code. The code is error-prone, depends
on the whole protocol stack, and does not handle truncated frames.
2.3.2 IEEE 802.11 parsing
Several practices exist regarding IEEE 802.11 frame parsing. A common malpractice among
researchers is to feed a program such as Wireshark or tcpdump, which parses frames and
output human-readable text, and then use a scripting language such as Perl to re-process
this output. This should be avoided for three reasons:
1. It processes each frame twice. One of the processings is done by a scripting language
and involves regular expressions. This is under-efficient for parsing a binary format.
2. Script code that involves regular expressions is error-prone and more difficult to maintain. In this case, the code also depends on the specific version of Wireshark or tcpdump used, as their outputs may change between versions. This imposes an extra
burden on maintainance.
3. This often results in overspecialized code. A change in the sequence of protocols each
packet uses might break the whole program.
Another practice consists in focusing on the specific bytes we are interested in for each
frame. This often produces error-prone and hard-to-maintain code. Listing 2.3 is a typical
example of such code. Who could tell it does check for pseudo-Ethernet frames that include
either OLSR messages in IPv6 UDP packets, or ICMPv6 router advertisements2 ? Even with
proper comments and constants, one would need to be very careful about the offsets and
the protocols under test (and we do not even mention handling truncated frames). Another
problem with this technique is that it is specific to a given problem per se – change only one
layer of the protocol stack, and the whole code must be rewritten.
Wireshark adopts a valid approach. Its frame parsing component generates a syntax
tree and one can access each of the frame’s field using a consistent API. We believe that this
2 OLSR
(Optimized Link State Routing) is a routing protocol. UDP (User Datagram Protocol) is a transport-
level protocol. ICMP (Internet Control Message Protocol) is a protocol from the Internet protocol suite. IPv6 and
ICMPv6 refer to the last version of the internet protocol, which succeed IPv4 and ICMPv4.
2.3. Packet parsing
16
6
struct hooks: public wifi::dissector_default_hooks
7
{
void
8
seq_ctl_hook(const void* ieee80211,
size_t
ieee80211_len,
10
unsigned
fragno,
11
unsigned
seqno)
9
{
12
std::cout << seqno << ’/’ << fragno << std::endl;
13
}
14
15
};
16
17
template <class PHY>
18
void
19
print(pcap::file<>& f)
20
{
for (pcap::file<>::iterator i = f.begin(); i != f.end(); ++i)
21
wifi::dissect<PHY, hooks>(*i);
22
23
}
Listing 2.4: A program using WiPal’s IEEE 802.11 parser. It uses the same main() function as
listing 2.2.
approach is however overkill for WiPal. Many algorithms only focus on a few fields inside
each frame; in this way, there is no need to spend resources on allocating and constructing a
whole structure. Furthermore, handling each frame element would be a waste of time, e.g.,
when one needs only two of them.
Instead of generating a syntax tree, WiPal’s parser calls user-given callback functions at
various stages of its processing (for instance, for each address field or each time a sequence
control field is encountered). When retrieving a specific field is unnecessary, the user just
provides an empty callback (WiPal actually provides empty callbacks by default). Callbacks
are static parameters to the parser, therefore compiler optimizations (function inlining and
dead code elimination) ensure efficiency. Listing 2.4 shows an example. The print() function of listing 2.2 now parses the frame using callbacks (hooks) defined in line 6. The parser
calls the seq_ctl_hook() function each time it parses a sequence number field. Note that,
despite some frames may be truncated or may not include sequence numbers, this is transparent to the user. Also note that the user does not need to care about bit manipulations
(inside IEEE 802.11 frames, sequence numbers and fragment numbers are respectively 12bit-long and 4-bit-long fields embedded into a 16-bit-long word, using the network byte
order). Finally, it would also be possible to use WiPal’s parser to build a syntax tree. To this
end, one just needs to implement the suited callbacks.
Chapter 2. WiPal: overview and design
17
2.4 Filters
WiPal processes packet traces using a pipe and filter pattern [17] . Iterators provide pipe input
and output [32] . A filter is an object that takes iterators as input and makes an output iterator
available. This section presents the benefits of using this pattern for packet processing.
Trace files using the pcap file format provide the base iterators for filters. Therefore this
section also presents the abstractions that provide basic iterators from pcap files, although
they are considered lower modules of the library.
2.4.1 Filter sources: pcap abstractions
pcap is the de facto standard for handling packet traces [7] . The format is both simple to
read and simple to write, and may handle any type of packet traces, which explains its wide
acceptance. Although some other formats exist (e.g., the formats used by Cheng et al. [22] or
VeriWave [57] ), WiPal does not implement them for now as they are still barely used. But one
could easily implement them with only minor intrusions into WiPal’s code. It is important
to underline that tools exist to convert other formats to pcap.
WiPal provides several abstractions for reading and writing pcap files. The following
sections elaborate on three original features of WiPal’s pcap system: (i) random access to a
pcap file, (ii) ability to aggregate several files as one pcap stream, and (iii) ability to attach
meta-data to a pcap stream.
Random access to a pcap file
A basic usage for a pcap stream is to retrieve an iterator pointing to the stream’s first packet.
Incrementing this iterator enables then the user to traverse the packet stream. But WiPal
also features another access mode. One can retrieve iterators pointing to arbitrary packets
in constant time. Random access to a packet is useful to focus on a trace’s specific portion in
an efficient way. Here is an example. When opening a pcap trace, standard trace visualizers
start by loading the whole trace into memory. Browsing the list of packets just requires memory accesses. This works for traces of reasonable size, but traces used in network research
are frequently several gigabytes long [50] . Such traces cannot be loaded into memory. For
instance, Wireshark on a GNU/Linux machine with 2 GB of RAM is unable to load traces of
more than 500 MB. A solution would be to load into memory only the part of the trace the
user is displaying at a given time. But as the user moves inside a trace, the program must be
able to quickly load the correct part. From a programming point-of-view, it is not possible
to re-traverse the whole trace each time, for performance reasons. Thus the need to access,
in constant time, any specific packet inside the trace.
WiPal achieves constant access to random packets using file indexes. When opening a
pcap file, WiPal performs a single file traversal and records its position into the file every K
2.4. Filters
18
Figure 2.3: A screenshot of WScout [24] . WScout uses WiPal’s random access feature to open
packet traces that do not fit in memory.
packets (K is a customizable parameter). When required an iterator to the pth packet, it seeks
to the recorded position of packet b p/K c and then traverses p mod K packets. Since seek()
is constant time, and at most K read operations are required (K being fixed and independent
of the trace file), random access is O(K ) = O(1). The smaller the K, the faster the operation,
but also the larger the index’s memory footprint. Also note that building the index requires
a single trace traversal. Therefore, this indexing mechanism is optional, so users can disable
it in case they do not need random access.
As a proof of concept, we developed a trace visualizer using this feature: WScout [24] .
Figure 2.3 displays a screenshot. Thanks to WiPal’s random access feature, WScout is able
to display in a graphical interface packet traces too large to fit into memory. To the extent
of our knowledge, WScout is the only visualizer with such a feature. It is available as free
software at http://wscout.lip6.fr/.
Chapter 2. WiPal: overview and design
19
pcap file aggregation
A common practice when capturing packets is to split the resulting packet stream into multiple files. Some tools require it in order to generate long traces (e.g., more than 2 GB).
Crawdad’s uw/sigcomm2004 dataset includes such traces [50] . To later process these traces,
one must consider the concatenation of the trace files as one unique packet stream. Despite
looking like a minor issue, this is an annoying burden for developers – one would like to
focus on the processing logic rather than working around measurement quirks.
WiPal enables users to consider multiple pcap files as one single pcap stream. Adding
this feature to a program is as simple as replacing every occurrence of pcap::file<> with
pcap::list<> (e.g., in the preceding code snippets). One can then use a specific syntax
to aggregate files. For instance, opening "file1.pcap:file2.pcap" will generate a stream
that outputs packets from file1.pcap first, then file2.pcap. Note how this operation is
transparent to end-users. Other services are also available, for instance to check consistency
of the list (e.g., to check that every file in the list use the same PHY format).
Packet stream meta-data
Trace files are often associated to some information they do not include directly. A common example is the IP or MAC address of the machine that generated the trace file (i.e.,
that performed the packet capture). Such information can be useful, for instance, if this machine injected packets during the capture, and one needs to filter these packets out when
processing the trace (e.g., because their timestamps are less accurate). A common practice
is to embed these pieces of information into the traces’ file names. Some tools require the
users to arrange trace files according to a specific filesystem tree.
In order to ease the programming of such mechanisms, every packet stream in WiPal
can embed meta-data. Streams’ meta-data in WiPal takes the form of a mapping from
a string to an object of any type. Users can therefore attach any needed piece of information to a packet stream. pcap lists use this mechanism. For instance, when opening
"file.pcap=192.168.1.1" or "foo.pcap:bar.pcap=10.0.0.1", WiPal associates the given
IP address to the corresponding stream, under the string addr. WiPal’s trace merging services use such information.
2.4.2 Processing filters
Filters are the core of WiPal’s advanced processing features. WiPal features a dozen filters
related to trace merging, synchronization, or anonymization. This section illustrates with a
simple example how they can improve code quality when dealing with packet traces. The
example is a program that anonymizes a packet trace and then prints statistics concerning
the resulting trace. This needs two filters and two “data sinks”. The filters are an anonymizer
and a timetracker, and the sinks are a pcap output stream (for the anonymized trace) and
2.4. Filters
20
Figure 2.4: A simple processing pipeline using two filters (represented as white boxes). Listing 2.5 displays the code implementing this pipeline.
a statistic extraction module. The anonymizer filter reads IEEE 802.11 frames and outputs
a copy of these frames truncated at the end of the MAC layer, and where MAC addresses
and ESSIDs (network identifiers3 ) have been replaced with random values. The timetracker
filter is in charge of extracting precise timestamps from PHY headers (it fallbacks using pcap
timestamps when there are no PHY headers). It also handles wraparounds (some timestamp
formats roughly wrap every one hour and a half) so it produces monotonically increasing
timestamps. Having as precise as possible timestamps is necessary in order to compute
statistics about the packet stream. Figure 2.4 shows how to connect each element. The
input file links to the anonymizer, and the anonymizer to the timetracker. We send then the
timetracker output to both the output stream and statistics module.
Listing 2.5 implements this program. One can distinguish three parts: type definitions
(lines 10–12), object declarations (14–17), and processing (19–24). Inside WiPal, every filter
is a class, therefore type definitions setup two type aliases for the filter classes: anonymizer
and timetracker. One can notice that the C++ types for filters embed the type of their input
iterators. This is a drawback of using static C++: when using many filters, one starts with
a long list of typedef’s, and this requires the user to juggle with type names. It is however
important to note that these static mechanisms enable compilers to perform optimizations
and produce efficient code. This is the key to WiPal’s performance. Furthermore, type
checking ensures correctness and safety, as the compiler does not let users mistake with
this part of the code. Finally, C++0x, the next release of the C++ standard to be published
soon [18] , will solve this problem, including features making writing these type definitions
useless (thanks to the auto keyword).
The next part of listing 2.5 (lines 14–17) declares the filter objects and end-modules. Connecting the filters is achieved by giving the proper iterators to the filters’ constructors (lines
14–15). Note that when the program runs, at this stage, no processing has started. Filters operate in a lazy fashion: the input file will not be read until we start reading the timetracker’s
3 ESSID
stands for Extended Service Set Identifier.
Chapter 2. WiPal: overview and design
6
template <class PHY>
7
void
8
process(pcap::file<>& f)
9
{
21
10
typedef filter::anonymizer<pcap::file<>::iterator, PHY>
anonymizer;
11
typedef filter::timetracker<typename anonymizer::iterator, PHY>
timetracker;
12
typedef typename timetracker::iterator
iterator;
13
14
anonymizer
a (f.begin(), f.end());
15
timetracker
t (a.begin(), a.end());
16
pcap::ostream
o ("output.pcap", f);
17
wifi::stats::stats
s;
18
for (iterator i = t.begin(); i != t.end(); ++i)
19
{
20
o << *i;
21
s.account<PHY>(*i);
22
}
23
std::cout << s << std::endl;
24
25
}
Listing 2.5: An example of advanced trace processing using filters. This program uses the
same main() function as listing 2.2.
output. Furthermore, filters only load into memory the data they need for producing their
next element, nothing more.
Finally, the last part of listing 2.5 (lines 19–24) reads each output frame from the timetracker, and sends it to the end-modules. Sending packets to a pcap::ostream object using
the << operator transparently write the corresponding pcap file. The following method call
to account() updates the statistics module. It is then possible to report these statistics on
the standard output using standard C++ streams and formatting operators. These statistics
include frame counts and traffic rates for each frame type/subtype, estimations of missed
frames, list of networks and cells, information about transmitters, and other various figures.
There are two important points about this program. First, this is very easy to alter its
behavior by just adding or removing the desired filters. For instance, one could add a filter
before the anonymizer to filter a certain type of packets out. Or the anonymizer could be
removed thus making the program a simple statistics extractor. The second point is that
filters are an easy mechanism to parameterize a processing. Some processings have parts
that can be implemented with different algorithms (for instance, a merge process might
use several synchronization algorithms). In such cases, testing various algorithms is just as
simple as changing the corresponding filter, without altering the remaining of the pipeline.
In other words, filters enable decomposing trace operations into several basic blocks, thus
making trace processing modular. As a consequence we can expect programs to be easier to
maintain and adapt, and code to be easier to re-use.
2.5. Performance evaluation
22
2.5 Performance evaluation
We evaluate WiPal’s efficiency using nine test programs involving WiPal and some wellknown packet processing software. We are both interested in how WiPal performs with regard to other programs and in the overhead generated by WiPal’s original features (namely
trace aggregation, random packet access, IEEE 802.11 parser, and filters).
2.5.1 Methodology
We use nine test programs. Here is a short description for each of them.
libpcap. This is a simple program that uses libpcap [7] to perform a single pcap file traversal.
Packets are discarded immediately after being read from the file. We use this program
as a reference.
WiPal-file. This is the same program as above, using WiPal instead of libpcap. It uses
pcap::file objects and its code is very similar to listing 2.1. The goal is to compare
WiPal’s pcap reading mechanisms to libpcap’s.
WiPal-list. This program is the same as WiPal-file, using pcap::list objects instead of
pcap::file (see Section 2.4.1). We use this test to measure the overhead of WiPal’s
file aggregation feature.
WiPal-parser. This program performs a single file traversal, calling WiPal’s IEEE 802.11
parser on all frames composing the trace. We use the default behavior of WiPal’s
parser, which is to call empty callbacks. This allows us measuring the overhead of an
“empty” parser. In the ideal theoretical case, the C++ compiler would optimize the
code out and the program would exhibit performances similar to WiPal-list. We also
compare this program to Scapy (see below) that performs basically the same task.
WiPal-random. This program tests WiPal’s random access feature (see Section 2.4.1). It
starts by building an index of its input file, then perform successive access to random packets. The number of random accesses is twice the input trace’s packet count.
Therefore, it does the equivalent of three file traversals: one using standard iteration
mechanisms and two using random accesses. If one subtracts from its execution time
the time for building the index (estimating it with WiPal-file) and divides the result by
two, the result is the average time of a single random traversal. One can use this result
to compute the overhead of random access over conventional iteration. In this program we use K = 4. This value ensures fast random access while keeping a reasonable
memory footprint. Measurements show that WScout [24] , using WiPal’s indexes with
K = 4, is able to load a 22 GB trace (including about 108,000,000 packets) using a total
of 560 MB of virtual memory.
Chapter 2. WiPal: overview and design
23
WiPal-filters. This is the program of listing 2.5. The goal is to have an idea of how a moderately complex processing performs using WiPal’s filters (anonymization and statistics extraction running simultaneously). Of course, this is not directly comparable to
tshark or tcpdump (see below) because each program implements different features.
But we expect these programs to have execution times in the same order of magnitude.
Scapy. This program is very similar to WiPal-parser in its features. It uses Scapy’s sniff
function to read its input file. Scapy parses each packet it reads, but we setup the
function to immediately discard the packet without further processing afterwards. We
setup Scapy to parse only the MAC layer.
tshark. This is the plain tshark program, which is the console version of Wireshark. It reads
the input file, parses each packet and display a text summary on standard output.
tshark relies on libpcap for its I/O operations.
tcpdump. This is the plain tcpdump program, which basically offers the same features as
tshark. Contrary to Wireshark, it uses a custom parser dedicated to printing packet
summaries on a terminal, and we expect it to be faster than Wireshark. tcpdump also
relies on libpcap for its I/O operations.
In order to evaluate a program, we feed it with a 460 MB pcap trace (including about
2,100,000 packets). We run each program a hundred times, measuring its execution speed
(accounting only the user and system time as reported by the time UNIX command). We
then compute the mean execution time and 95% confidence intervals. We always use the
same trace file: (i) we do not expect another trace with a similar size to lead to significant
differences, and (ii) each of these program is linear w.r.t. the trace size, so the average processing time per packet will not change with bigger or smaller traces. The trace comes from
a real-world measurement and may be considered average-sized for measurements in wireless environments. In order to avoid disk slowdowns, we store this file in a RAM disk and
R
we redirect all outputs to /dev/null. The machine executing these tests is a dual-core Intel
R
Pentium
D CPU at 3 GHz, with 2 GB of RAM and a 2 MB cache.
2.5.2 Results
Figure 2.5 displays the results. It is important to keep in mind that many of these programs
do not have the exact same features as the others. Therefore, most of the time, one should
not expect precise comparisons from these results. They rather give an idea about order of
magnitudes for a typical trace. We can nevertheless draw a number of interesting conclusions.
Comparison with libpcap.
A first thing to notice is that WiPal’s packet reading features
perform almost as well as libpcap’s (WiPal-file is 120 ms slower, for a total execution time
2.5. Performance evaluation
24
Mean execution time
1h
70 min
1 min
70 s
28 s
10 s
13 s
12 s
1s
830 ms 950 ms 950 ms 970 ms
lib
W
W
W
W
W
Sc
iP
iP
iP
iP
iP
pc
ap
a
a
a
a
a
ap
y
l-f
l-l
l-p
l-r
l-f
is
ile
i
a
ar
lte
nd
t
se
r
om
s
r
ts
tc
ha
rk
pd
um
p
Figure 2.5: Mean execution time for a hundred runs of the various test programs. Note that
most 95% confidence intervals are too small to be distinguished clearly.
of nearly a second). This extra delay is negligible: as shown in WiPal-filters or tcpdump,
on more elaborated processings, the time actually spent performing I/O operations is small
compared to the time spent performing other computations. The important point is that
using iterators does not sensibly impact the performance of WiPal’s C++ API. WiPal’s I/O
speed is comparable to libpcap’s.
Overhead of WiPal-list and WiPal-parser. It is interesting to note that WiPal-list leads to
the same execution time as WiPal-file and that WiPal-parser perform almost as well as the
previous two (a couple dozen milliseconds slower). One can draw two conclusions: (i) the
file aggregation feature has a negligible cost, and (ii) the generic parser implementation
using static callbacks is efficient. This means only user-provided callbacks might cause a
sensible overhead.
Overhead of random access.
The WiPal-random program runs in 28 seconds. Thus, we
can estimate that traversing the trace once using a random order takes less than 14 seconds
(remember the WiPal-random program performs one sequential and two random access per
packet). This makes random access to a packet roughly 14 times slower than sequential access in practice. In theory, however, with K = 4, this should only be twice slower on average.
We believe the difference between theory and practice is due to the fact that a random file
access at the standard library level breaks the underlying buffering mechanisms (whereas
a sequential access does not). As a conclusion, random access is significantly slower but is
still reasonable with regard to the feature offered. This extra delay is of the same order of
magnitude than the one of other processings such as WiPal-filters or tcpdump.
Overhead of using filters for advanced trace processing.
WiPal-filters runs in 13 seconds.
This is about the same execution time as tcpdump, while tshark and Scapy are at least an
Chapter 2. WiPal: overview and design
25
order of magnitude slower. Therefore, WiPal’s design does not hinder its efficiency: by
using filters, WiPal achieves performance levels that are similar to specialized programs.
WiPal-filters use two filter objects: an anonymizer and a timetracker. The anonymizer relies
in part on WiPal’s generic IEEE 802.11 parser and the timetracker uses PHY abstractions to
extract timestamps from PHY headers. On the one hand, this means that WiPal’s genericity
does not preclude it to compare to specialized code. On the other hand, the extra-genericity
in tshark’s design (compared to tcpdump) is at the cost of reduced performance (tshark is
about seven times slower than tcpdump or WiPal-filters). Scapy is even slower, requiring
more than one hour to process the trace. In this case, the first cause is its implementation
language (Python). Of course, scripting languages are known to be slower than compiled
ones, but they are also dynamic. Therefore, they lack several optimization opportunities.
For instance, Scapy cannot optimize the parser out, even though each packet is discarded,
while it is possible with WiPal-parser that provides the same features as Scapy.
As a conclusion for this section, WiPal does not sacrifice performance to reusability. It
may even outperform existing state-of-the-art programs. Its I/O operations are almost as
fast as libpcap’s despite the extra features. It also has a generic design that can compete
with specialized code. This is a strong argument towards adopting WiPal, instead of writing
specific code when designing a packet trace manipulation software.
2.6 Conclusion
This chapter presented WiPal, a packet manipulation framework, and reported on our experience designing it. To the extent of our knowledge, WiPal is the only framework that
focuses both on performance and genericity. This makes it a valuable tool for researchers
who need to develop packet trace processing software. Though WiPal addresses mostly the
IEEE 802.11 protocol, it also provides several protocol-agnostic features (e.g., pcap I/O operations). Furthermore, WiPal uses patterns that could be useful to handle other types of
packet traces and protocols.
WiPal introduces a number of original features and a novel design. It features trace
anonymization, statistics extraction, synchronization, merging, as well as other miscellaneous operations. Instead of relying on syntax trees, its IEEE 802.11 frame parser uses a
static callback mechanism. By applying modern compiler optimizations, we obtain generic
and fast operations. WiPal features the ability to index pcap files, thus allowing random
access to packets. These accesses are constant-time and only imply limited overhead. It is
also possible to aggregate trace files and to consider the concatenation of several files as one
unique packet stream. WiPal also includes a mechanism to attach meta-data to trace files, as
they are often associated to data they do not include directly. Finally, WiPal’s whole design
is based on a pipe and filter pattern. This pattern enables decomposing trace operations
into several basic blocks, thus making trace processing modular. The consequence is that
26
2.6. Conclusion
programs become easier to maintain and to adapt, and code easier to re-use. Measurements
show that WiPal compete with state-of-the-art packet processing software. Its I/O operations are almost as fast as libpcap’s, and its generic design is as fast as specialized code.
Chapter 3
WiPal: IEEE 802.11 trace merging
T
HE most innovative part of WiPal is probably the one dedicated to merging IEEE
802.11
packet traces. This merger includes original algorithms and focuses on performance,
ease-of-use, and flexibility. We achieve performance using a proper design and careful programming. Ease-of-use and flexibility, on the other hand, are the consequences of a number
of characteristics that distinguish WiPal’s trace merger from other software:
Offline operation. Because it is designed to run offline, WiPal is independent of the monitors. This means that one may use any software to acquire data. Most trace mergers
expect monitors to embed specific software [22;28] .
Independence of infrastructure. WiPal’s internal algorithms do not expect features from
traces that would require monitors to access a network infrastructure (e.g., “loose”
sniffer synchronization using NTP, the network time protocol). Monitors just need to
record data in a compatible input format.
Compliance with multiple formats. WiPal supports most of the existing input formats. On
the other hand, other trace mergers require a specific format. Some tools even require
a custom dedicated format [22] .
Hands-on design. WiPal is usable in a straightforward fashion by just calling the adequate
programs on trace files. Other mergers require more complex setups (e.g., a database
server [43] or a network setup involving multiple servers [22] ).
This chapter explains the design and internals of WiPal’s merger. We also intend complement existing papers in the literature and give additional insights about the complex process
of trace merging. Section 3.1 first gives an overview of existing trace merging techniques.
As every other tools WiPal uses these techniques. Then Section 3.2 explains the basics of Wi-
Pal’s merging algorithms. Section 3.3 goes into more details, and presents each distinct part
individually. Eventually Section 3.4 provides an evaluation of WiPal’s efficiency regarding
trace merging.
27
3.1. Trace merging: state of the art
28
A. The traces are not synchronized and miss some frames.
B. One identifies some reference frames common to both traces. This
information enables trace synchronization.
C. One adjusts the frames’ timestamps and synchronize T1 and T2 .
D. One can merge the traces. Duplicate frames are only accounted
once.
Figure 3.1: Merging two traces T1 and T2 .
3.1 Trace merging: state of the art
Wireless sniffing requires the use of multiple monitors for coverage and redundancy reasons.
Coverage is concerned when the distance between the monitor and at least one of the transmitters to be sniffed is too large to ensure a minimum reception threshold. Redundancy
is the consequence of the unreliability of the wireless medium. Even in good radio conditions monitors may miss successfully transmitted frames. After the collection phase, traces
must be combined into one. A merged trace holds all the frames recorded by the different
monitors and gives a global view of the network traffic.
The traditional approach to merge traces involves a synchronization step, which aligns
frames according to their timestamps. This step includes identifying the frames that are
identical in all traces so that they appear once and only once in the output trace (Cheng
et al. [22] refer to this operation as unification). Figure 3.1 illustrates this process (more details
are given in Section 3.3).
Synchronization is difficult to obtain because, in order to be useful, it must be very precise. Imprecise frame timestamps may result in duplicate frames and incorrect ordering in
the output trace. An invalid synchronization may also lead to distinct frames accounted as
the same frame in the output trace. In order to avoid such undesirable effects, one needs
Chapter 3. WiPal: IEEE 802.11 trace merging
29
precision of less than 106 µs [59] . To the best of our knowledge, only the VeriWave WaveTest
appliances [57] are able to synchronize network cards’ clocks with such a precision (note that
we are interested in frame arrival times in the card, not in the operating system). But this
requires a specific wiring among each sniffer, and this hardware is expensive.
Therefore, all merging tools post-process traces to resynchronize them with the help of
reference frames, which are frames that appear in multiple traces. One may readjust the traces’
timing information using the timestamps of the reference frames (see Figure 3.1). Finding
reference frames is however a hard task, since we must be sure a given reference frame is
an occurrence of the same frame in every traces. That is, some frames that occur frequently
(e.g., MAC acknowledgements) cannot be used as reference frames because their content
does not vary enough. Therefore, only a subset of frames are used as reference frames, as
explained later in this paper (cf. Section 3.3).
A few trace merging tools exist in the literature, but they do not focus on the same set
of features as WiPal. For instance, Jigsaw is able to merge traces from hundreds of monitors, but requires monitors to access a network infrastructure [22] . This paper however considers smaller-scale systems (dozens of monitors) but where no monitor can access a network infrastructure. WisMon is an online tool that has requirements similar to Jigsaw [28] .
CrunchXML [51] is a tool that uses the same merging algorithm as WisMon, but that can
work either online or offline. However, due to this algorithm, its operation needs all sniffers
to hear a common access point. In order to work in all kinds of environments, WiPal cannot make such an assumption (sometimes access points are not shared among all traces, or
there are no access points). The system that is the closest to ours is Wit [43;44] . Although Wit
provides valuable insights on how to develop a merging tool, it is difficult to use, modify,
and extend in practice (cf. its authors’ note in CRAWDAD [44] ).1 This explains in part our
motivation to propose a new trace merger.
3.2 WiPal’s basics
WiPal has been designed according to the following constraints:
No wired connectivity. The sniffers must be able to work in environments where no wired
connectivity is provided. The idea is to be able to perform measurements when it
is difficult to have all sniffers access a shared network infrastructure (e.g., in some
conference venues, or when studying interferences between two wireless networks
belonging to distinct entities).
1 Note
that we refer to Wit’s merging process, and not on the other features available (e.g., a module to infer
missing packets).
3.3. Detailed operation of WiPal’s trace merging
30
Simplicity to the end-user. We believe simplicity is the key to re-usability. Users are not
expected to install and set up complex systems (e.g., a database backend) in order to
use WiPal.
Clean design. WiPal exhibits a modular design. Developers can easily adapt part of the
trace merger or integrate them to other systems (e.g., reference frames identification
process, synchronization, or merging algorithm).
These constraints require an offline trace merger that does not require traces to be synchronized a priori. In practical terms, this means that sniffers only have to record their measurements on a local storage device, using the widely used pcap file format [7] . With regard
to this format, WiPal supports all mainstream PHY headers: raw IEEE 802.11 frames, AVS,
Prism, and Radiotap headers.2 Some wireless packet traces use another link type though: IP
packets encapsulated into pseudo-Ethernet frames. It is important to note that such traces
are not MAC traces (only IP packets are available) and thus do not contain enough information for accurate synchronization and merging. WiPal merges these traces when requested,
but this is an experimental feature that has not been extensively tested. As seen in the previous chapter, adding new link types is straightforward: WiPal’s design principles only needs
implementing the right abstractions and modifying only a couple of lines in the existing
codebase.
One can access WiPal’s merging services through its software library or using a set of
binaries to manipulate wireless traces. All tools work directly on pcap files both as input
and output. wipal-merge is the main command to merge an arbitrary number of traces:
$ wipal-merge t1.pcap t2.pcap [t3.pcap...]
It is worth mentioning that intermediate steps of the merge procedure can be performed
separately, such as:
$ wipal-intersect-unique-frames t1.pcap t2.pcap
$ wipal-synchronize t1.pcap t2.pcap sync_t1.out.pcap
3.3 Detailed operation of WiPal’s trace merging
Figure 3.2 depicts WiPal’s structure. Each box represents a distinct module and arrows show
WiPal’s data flow. WiPal takes two wireless traces as input and produces a single merged
trace.3 In the following, we explain in detail the functioning of each one of the modules.
2 See
3 In
Chapter 2, Section 2.3.1 for an explanation about PHY headers.
order to merge more than two traces, it suffices to execute the merging tool as many times as required
(two by two). The wipal-merge command does this automatically.
Chapter 3. WiPal: IEEE 802.11 trace merging
31
Figure 3.2: The structure of a merge process in WiPal.
3.3.1 Identifying reference frames
This section explains the process of extracting reference frames. This operation involves two
steps: extraction of unique frames and intersection of unique frames (see Figure 3.2).
Let us first define what a unique frame is. A frame is said to be unique when it appears
“in the air” once and only once for the whole duration of the measurement. A frame that is
unique within each trace but that actually appeared twice on the wireless medium should
not be considered as unique.
The process of extracting unique frames finds candidates to become reference frames.
The process of intersecting unique frames identifies then identical unique frames from both
traces to become reference frames.
3.3. Detailed operation of WiPal’s trace merging
32
3.3.2 Extraction of unique frames
WiPal considers every beacon frame and non-retransmitted probe response as unique frames.
These are management frames that access points send on a regular basis (e.g., every 100 ms
for beacon frames). The uniqueness of these frames is due to the 64-bit timestamps they
embed (these timestamps are not related to the actual timestamps used for synchronization,
as we will see later).
In practice, the extraction process does not load full frames into memory. It uses 16-byte
hashes instead, which are stored in memory and used for comparisons. Limiting the size
of stored information is an important aspect since, as we will see later, WiPal’s intersection
process performs a lot of comparisons and needs to store many unique frames in memory.
Tests with CRAWDAD’s uw/sigcomm2004 dataset [50] have shown that this technique is
practical. For instance, WiPal needs less than 600 MB to load 7,700,000 unique frames.
There are some rare cases where the assumption that beacons and probe responses are
unique does not hold. The uw/sigcomm2004 dataset has a total number of 50,375,921
unique frames (about 14% of the total 364,081,644 frames). Among those frames, we detected 5 collisions (distinct unique frames sharing identical hashes). WiPal’s intersection
process includes a filtering mechanism to detect and filter such collisions out.
3.3.3 Intersection
The intersection process intersects the sets of unique frames from both input traces. There
are multiple algorithms to perform such a task. Based on Cheng et al. [22] , a solution is to
“bootstrap” the system by finding the first unique frame common to both traces and then use
this reference frame as a basis for the synchronization mechanism, as shown in Algorithm 1
(we call this algorithm streaming intersection). One may also use subsequent reference frames
to update synchronization. This algorithm is practical because the inner loop only searches
a very limited subset of I2 . It has several drawbacks though: (i) the performance of the algorithm strongly depends on the precision of the synchronization process; (ii) finding the first
reference frame is an issue when no other synchronization mechanisms are available; (iii)
this algorithm couples intersection with synchronization, which is undesirable with respect
to modularity; and (iv) there is a possibility that some frames are read multiple times from
I2 . More specifically, access to I2 is not sequential.
We propose the retained intersection algorithm that is much simpler to implement and
that avoids the drawbacks of the abovementioned solution (see Algorithm 2). Its main characteristics are: (i) it does not require a bootstrapping phase; (ii) it does not depend on any
kind of synchronization; and (iii) it sequentially reads each frame only once from I1 and I2 .
Our algorithm starts by loading all unique frames of the first trace into memory. This
precludes using it as an online tool. Note that loading all unique frames from a trace into
memory may also hog resources; this justifies the importance of having small identifiers for
Chapter 3. WiPal: IEEE 802.11 trace merging
33
Algorithm 1 Streaming intersection (uses synchronization).
Input: two lists of unique frames I1 and I2 .
Output: a list of reference frames.
δ ← synchronization precision
for all u1 ∈ I1 do
tu1 ← u1 ’s time of arrival
for all u2 ∈ I2 between tu1 − δ and tu1 + δ do
if u2 is an occurrence of u1 then
Append (u1 , u2 ) to output.
end if
end for
end for
Algorithm 2 WiPal’s retained intersection.
Input: two lists of unique frames I1 and I2 .
Output: a list of reference frames.
h←∅
. Implement h with a hash table.
for all u1 ∈ I1 do
Insert u1 into h.
end for
for all u2 ∈ I2 do
if h contains an element u1 equal to u2 then
Append (u1 , u2 ) to output.
end if
end for
the unique frames. These constraints are however irrelevant in practice. To support our
argument, let us show an example using the uw/sigcomm2004 dataset. The biggest traces
are those from sniffers mojave and sonoran on channel 11 (roughly 19 GB each). Extracting
these traces’ unique frames and intersecting them using WiPal needs 575 MB of memory.
Therefore, memory aggressiveness is not a concern in our algorithm.
Another advantage of the proposed algorithm is its ability to detect collisions of unique
frames within the first trace. As indicated in Algorithm 2, this algorithm uses a set h (in
practice, implemented using a hash table) that contains unique frames from the first trace.
One detects collisions when trying to insert into h an element that is already part of it. When
WiPal encounters such cases, it memorizes collisions, and filter them out of the hash table
before starting the algorithm’s second loop. Of course, collisions in the second trace remain
undetected. Even if WiPal detected them, there would still be the possibility that a collision
spans across both traces (i.e., each trace contains one occurrence of a colliding unique frame).
Such cases lead to producing invalid reference frames. To detect invalid reference frames,
WiPal looks at possible anomalies w.r.t. the interarrival times between unique frames. In
3.3. Detailed operation of WiPal’s trace merging
34
Dataset
Environment
Hardware
uw/sigcomm2004
Conference
Laptop
Private 1
Private 2
Private 3
Office
building
Uptown
apartment
Office
building
Soekris
Netbook
Netbook
Id.
Chan.
1
11
2
6/8
3
6
4
6
5
11
6
6
7
6
8
1
Table 3.1: Characteristics of the traces used for testing merge operations. Id. relates to the
identification number of the merge operations.
practice, invalid references are rare: only three occurrences when merging uw/sigcomm2004’s channel 11 (a 73 GB input which produces a 22 GB output).
3.3.4 Synchronization
Synchronizing two traces means mapping trace one’s timestamps to values compatible with
trace two’s. WiPal computes this mapping with an affine function t2 = a t1 + b. It estimates a
and b with the help of reference frames as the process runs. Several techniques exist to perform these estimations: linear interpolations [44] , linear regressions [31;59] , or solving linear
problems [52] . To combine generality with speed efficiency, WiPal uses a simple generalization of the techniques from Mahajan et al. [44] and Yeo et al. [59] . Note that other techniques
could also be implemented without requiring modifications in other WiPal’s components.
WiPal’s synchronization process operates on windows of w + 1 reference frames (finding
an optimal value of w is discussed below). For each reference frame Ri , the process performs
a linear regression using reference frames Ri−bw/2c , . . . , Ri+dw/2e . At the beginning and at the
end of the trace, we use R1 , . . . , Rw and R N −w , . . . , R N (N is the number of reference frames).
The result gives a and b for all frames between Ri and Ri+1 .
Experiments led us to choose 3 as the optimal value for w (i.e., WiPal performs linear
regressions on windows of 4 reference frames). Figure 3.3 shows the results of performing
eight merge operations (on sixteen traces from four distinct datasets) with varying window
sizes. The merges concern 12 hour-long excerpts of various traces. One of the four dataset is
uw/sigcomm2004 while the three others are private datasets we collected. Table 3.1 presents
some characteristics of the traces we used for each merge operation. It is important to note
that these sixteen traces were collected with various hardware in several environments, on
different channels. We define the synchronization difference between two traces as follows.
First, consider only the subset S of frames that are shared by both T1 and T2 . For a given
frame f , let t f ,1 be the arrival time of f inside T1 (after clock synchronization) and t f ,2 be the
Chapter 3. WiPal: IEEE 802.11 trace merging
35
Synchronization difference (µs)
Merges 1 and 3 to 8
1.4
1.2
1
0.8
0.6
0.4
0.2
2
3
4
8
16
32
64
128
8
16
32
Window size (w + 1)
64
128
Merge 2
3
2.8
2.6
2
3
4
Figure 3.3: Synchronization difference w.r.t. linear regression window size. The upper curve
represent average, minimum, and maximum values for seven of the eight merges. The
lower curve represent the result for the other one, and is plotted separately because it has a
singular shape. We think this is related to the timestamping accuracy of the input traces for
this merge.
arrival time of f inside T2 . The synchronization difference is given by
1
|S|
∑ f ∈S |t f ,2 − t f ,1 |.
One can summarize the synchronization difference as the average difference of synchronization between frames that are identified as shared among input traces.
With the exception of merge 2 that exhibits a very singular behavior, Figure 3.3 shows
that w = 2 leads to the minimum synchronization difference. Note that techniques that use
w = 1 (i.e., that performs linear interpolations on couples of reference frames) lead to the
worst synchronization difference in average.
However, choosing a w that is too low or too high might lead to missing some shared
frames. Figure 3.4 shows the number of frames that are identified as duplicates in the input
traces with respect to window size. Whereas using 3 ≤ w or w ≤ 7 allows to detect the maximal number of shared frames, using other values leads to some missed duplicates. Note
that w = 1 gives the worst results. That indicates synchronizing traces using linear interpolation (as Wit [44] does) may lead to incorrect results. Therefore WiPal uses w = 3: among the
values that detect the maximum shared frames, this is the one that leads to the minimum
synchronization difference.
3.4. Evaluation
#shared frames (normalized)
36
1.000
0.995
1
0.990
0.99998
0.99996
0.985
3
4
8
16
32
0.980
2
3 4
8
16
32
Window size (w + 1)
64
128
Figure 3.4: Number of frames detected as shared by both input traces w.r.t. linear regression
window size. The curve represents the average, minimum and maximum values for eight
merge operations. For each merge operation, this number is normalized using 1 as the
number of frames from the window size that gives the highest value.
3.3.5 Merging
We now present how WiPal performs the final step, namely the merging process itself. Its
role is to copy frames from synchronized traces to the output trace. Of course, it must
organize its output correctly while avoiding duplicate frames.
Algorithm 3 details WiPal’s merging algorithm. For the sake of illustration, we present
here a simplified version that assumes that only one frame is emitted at a given time inside
the monitoring area. It simultaneously iterates on both inputs, where each iteration adds
the earliest input frame to the output (lines 15–16). Duplicate frames are the ones that have
identical contents and that are spaced less than 106 µs (line 11). The rationale for this value is
that 106 µs is half the minimum gap between two valid frames [59] . Therefore, the appearance
of identical frames during such an interval is in fact a unique occurrence of the same frame.
3.4 Evaluation
This section provides an evaluation of WiPal’s merging algorithms using the datasets previously described. We investigate both the correctness and the efficiency of WiPal. We run the
merges and then use some heuristics to evaluate the quality of the result. We also analyze
WiPal’s execution speed.
3.4.1 Correctness
Checking the correctness of merge outputs is difficult. Being able to test whether traces
are correctly merged or not would be equivalent to knowing exactly in advance what the
merge should look like. Unfortunately, there is no reference output against which we could
Chapter 3. WiPal: IEEE 802.11 trace merging
37
Algorithm 3 WiPal’s merging algorithm.
Input: two synchronized traces T1 and T2 .
Output: the merge of T1 and T2 .
1: procedure A DVANCE( f : frame, T: trace)
2:
Append f to output; f ← T’s next frame (or nil)
3: end procedure
4: f 1 ← T1 ’s first frame; f 2 ← T2 ’s first frame
5: while f 1 6= nil or f 2 6= nil do
6:
if f 1 = nil then A DVANCE( f 2 , T2 )
7:
else if f 2 = nil then A DVANCE( f 1 , T1 )
8:
else
9:
t f 1 ← f 1 ’s time of arrival
10:
t f2 ← f 2 ’s time of arrival
11:
if f 1 = f 2 and |t f1 − t f2 | < 106µs then
12:
Append either f 1 or f 2 to output.
13:
f 1 ← T1 ’s next frame (or nil)
f 2 ← T2 ’s next frame (or nil)
14:
15:
else if t f1 < t f2 then A DVANCE( f 1 , T1 )
16:
else A DVANCE( f 2 , T2 )
17:
18:
end if
end if
19: end while
compare. Thus, we propose several heuristics to check if WiPal introduces inconsistencies in
its outputs. We also check WiPal’s correctness with a test-suite of synthetic traces for which
we know exactly what to expect as output.
A broken merging process could lead to several inconsistencies in the output traces.
Regarding our datasets, we investigate in particular two of those inconsistencies: duplicate
unique frames and duplicate data frames.
Duplicate unique frames.
As seen previously, every unique frame should only occur once
in the traces (including merged traces). Yet, it is difficult to avoid collisions in practice (see
Section 3.3.2). Thus one should not consider all collisions as inconsistencies. After merging,
our traces have 6 collisions. After a manual check, five of them are not inconsistencies introduced by WiPal’s merging process. The last one is due to a synchronization error of 1.5
millisecond. When looking closer at the output trace, it appears that error spans 4.7 milliseconds and duplicates at most 4 frames (a beacon frame and three identical retransmitted data
frames). We believe this is an excellent score, considering our inputs have 79,340,347 frames
with various timestamping accuracies.
3.4. Evaluation
38
Duplicate data frames.
We search traces on a per-sender basis for successive duplicate
data frames (only considering non-retransmitted frames). Such cases should not occur in
theory – without retransmissions, sequence numbers should at least vary. Surprisingly,
some input traces contain such anomalies. We have no explanations why some datasets
exhibit those phenomena. We checked however that the merged trace does not have more
duplicates than the original traces (inputs have 1,689 duplicates while the output only has
1,149).
3.4.2 Efficiency
Trace merging is a run-once operation and WiPal is an offline process. Yet speed is an important metric to consider:
• It is always desirable to make a program run faster, as long as it does not answer
instantaneously. Especially, as the following section shows, WiPal is able to perform
in minutes what takes hours with other merging software.
• Less time spent merging means more time is available for other more important processing (e.g., analyzing the dataset, which might also be a heavy operation). As another example, the merge operation might run on a multi-user system, with other
users having some time constraints.
• Shorter delays between trace collection and trace analysis means more interactivity
and gains in productivity (e.g., if the collected traces have issues, it might be desirable
to detect it quickly in order to fix the problem, possibly by collecting other traces).
Merging all the traces (17.5 GB) takes 35 minutes (real time as reported by the time UNIX
command) on a 3 GHz processor with 2 GB RAM. The average CPU usage is 93%. User time,
that does not account system delays and thus disk slowdowns, is 31 minutes and 32 seconds.
Comparing WiPal with online trace mergers does not make much sense: their mode of
operation is different, and these also have different requirements (e.g., wired connectivity
and loose synchronization). The comparison would be unfair. We can however compare
WiPal with Wit [44] , another offline merger. Wit works on top of a database backend, which
means that trace files need to be imported into a database before any further operation can
begin (e.g., merging or inferring missing packets). Using the same machine as before, importing all input traces into Wit’s database takes 8 hours and 20 minutes (user time). This
means that, before Wit begins its merge operations, WiPal can perform at least 14 runs of a
full merge with the same data. WiPal allows then tremendous speed improvements. One of
the reasons for such a difference is WiPal uses high performance C++ code while Wit is just
a set of Perl scripts using the SQL language to interact with a database.
Chapter 3. WiPal: IEEE 802.11 trace merging
39
3.5 Conclusion
This chapter introduces WiPal’s trace merger. As an offline merger, it does not require sniffers
to be synchronized nor to have access to a wired infrastructure. WiPal provides several
improvements over existing equivalent software: (i) it comes as a simple program able to
manipulate trace files directly, instead of requiring a more complex software setup, (ii) its
synchronization algorithm offer better precision than the existing algorithms; and (iii) it has
a clean modular design. Furthermore, we also showed WiPal is an order of magnitude faster
than Wit [44] , the other available offline merger.
We have several plans for the future of WiPal’s merging procedure. First, we are currently extending it to include new features. For instance, we are working with other contributors in order to merge other types of packet traces using WiPal’s algorithms. We are
also working with researchers from the University of California, Los Angeles on new synchronization algorithms. We would also like to make better use of WiPal’s modularity and
test other algorithms for the various stages of the merging operation.
40
3.5. Conclusion
Part II
Applying WiPal: empirical analyses
41
Chapter 4
Accuracy of wireless packet sniffing
O
NCE
one has tools for sniffing and merging, the question of trace completeness arises.
With Wi-Fi sniffing each sniffer trace is incomplete (i.e., it lacks some frames). There-
fore, it is possible that the merged traces are incomplete as well. This chapter focuses on two
aspects of trace completeness in IEEE 802.11 networks. First, we observe that existing techniques to evaluate trace completeness are inaccurate (see Section 4.3). Among other issues, a
single buggy device may be responsible for blundering the whole system. Second, we study
how the number of sniffers impacts trace completeness (see Section 4.4). Using up to eight
sniffers sharing (approximately) the same location, we show that even though individual
sniffers may provide good accuracy, sometimes using eight sniffers is still not enough to
capture all frames. Furthermore, the sniffing process exhibits a high level of randomness
with variable accuracy.
To obtain these results we conduct two similar controlled experiments. In each experiment one records a spot’s Wi-Fi activity for a given duration using multiple sniffers. All
sniffers have (approximately) the same location. It is then possible to analyze each sniffer
trace, compare it to each other, and compute merge operations with a varying number of
traces. Eventually, studying each merge operation with respect to the number of traces that
compose it provides comparative information.
This chapter is structured as follows. Section 4.1 presents the existing techniques to
estimate trace completeness. Section 4.2 introduces our datasets and provides a preliminary
analysis. Then Section 4.3 evaluates our datasets’ completenesses and draws conclusions
about existing evaluation techniques. Eventually, Section 4.4 studies the impact of using
multiple monitors on completeness.
4.1 Completeness evaluation: state of the art
When collecting IEEE 802.11 data using wireless sniffing, trace completeness is a key issue. Even under good radio conditions, sniffers may miss a successful transmission. Since
43
4.1. Completeness evaluation: state of the art
44
missed frames are unrecorded, it is impossible to know exactly how complete a trace is.
Several methods exist however to estimate the efficiency of wireless sniffing as a technique.
Other methods exists to estimate the completeness of single traces. Here is a panel of previous related works.
Yeo et al. [59] use active indoor measurements (in a single university building). They
estimate sniffer traces feature at least 73% of all of their experiment’s frames. When merging traces from three monitors, they obtain a completeness of at least 99%. Using similar
experiments in the same kind of environment, Cheng et al. [22] experience a completeness
of 95%.
Serrano et al. [54] also perform active measurements using an anechoic chamber. Their
results show that single sniffer accuracy varies significantly across sniffers, and that performance may also depend on the nature of the experiment under study and on slight changes
of the sniffer position. With this best-case scenario using an anechoic chamber, they obtain
a completeness of about 96% for single sniffers, on average.
Based on message sequences allowed by the IEEE 802.11 standard one can infer some
missing frames. For instance, since an acknowledgment frame must succeed a successful
data frame transmission, a trace containing only one ack. with no preceding data lacks
a frame. Of course, other rules exist for other frame types. Using this technique on real
traces from an IETF meeting, Jardosh et al. [38] estimates completeness of at least 80% for
individual sniffers (due to the dataset, no merging was possible, and therefore no data is
available concerning the accuracy of merging).
Rodrig et al. [49] use another technique based on frame sequence numbers to estimate
the completeness of their traces. This technique is simple: since most IEEE 802.11 frames
contain a sequence number, they look at sequence gaps to estimate missing frames. Using
traces they record at the 2004 SIGCOMM conference, they evaluate an overall completeness
of “roughly 90%”. Curiously, after merging the same dataset, Mahajan et al. [43] estimate an
equal completeness of 90% for channel 1, but also a lower completeness of 79% for channel 11.
Schulman et al. [53] also raise an interesting point: since the parameters that impact trace
completeness may vary during measurements one should not use it as an accurate indicator of trace quality. For instance, a sniffer might provide a very accurate recording during
“silent” periods where only a few access points send beacons, but perform very badly when
the network load grows. To solve this issue they propose using dedicated “T-Fi” plots [53] .
While we agree with them, we believe however that studying trace completeness is still interesting in some cases. It provides quick insight and is easy to understand. For instance, a
trace with a low completeness raises issues, whatever the network load through time.
As a summary, existing techniques rely on the fact that network protocols define “valid
frame sequences”. When a trace contains an invalid (incomplete) frame sequence, one finds
a number of frames to insert so that the sequence becomes valid. This counts for a number
Chapter 4. Accuracy of wireless packet sniffing
45
Figure 4.1: ASUS EeePC 700 with three Netgear WG111v3, as used for trace collection.
of missing frames. Regarding IEEE 802.11, two categories exist: (i) message-type techniques
that rely on frame types (e.g., a management or data frame must precede an acknowledgement) [22;38;43] and (ii) seqnum-based techniques that rely on sequence numbers (e.g., if frame
42 occurs right after frame 39, then frames 40 and 41 are missing) [49;53] .
Applications of these techniques show attractive results [22;38;43;49;53;59] . In “academic”
environments (laboratories, campuses, conference venues), the literature shows that individual sniffers exhibit completeness values between 70% and 80%. By merging traces, it
is possible to reach values above 90%. But, as we will see in the following, we could never
achieve such values in our experiments.
4.2 Datasets
We study trace completeness using two datasets. They feature traces from multiple sniffers,
each one equipped with three IEEE 802.11 radio interfaces (ASUS EeePC 700 and Netgear
WG111v3, see Figure 4.1). Interfaces listen on channels 1, 6, and 11. Each radio is set up in
monitor mode and records every frame it hears regardless of the network the frame comes
from. We merge then each sniffer’s traces (on a per channel basis) using the WiPal software
suite [25;26] .
4.2.1 Overview
We measure both datasets in the same environment but at different times. They record
wireless activity in the computer science laboratory building of Université Pierre et Marie
Curie. It spans four floors of a twelve-floor building mostly occupied by private companies.
It is located in Paris way outside the university campus. We refer to the datasets as follows:
4.2. Datasets
46
2008-12-01
2008-12-19
Duration
1h13
2h10
1
3 GB / 578 MB
2.8 GB / 833 MB
Data size 2
1 GB / 203 MB
210 MB / 82 MB
190
341
ESSIDs
13
24
Access points
66
122
Ad Hoc cells
3
3
Size
Devices
3
1 Sizes
before/after one merges the dataset. Only includes IEEE
802.11 frames and their payloads.
2 Data sizes before/after one merges the dataset. Only includes
IEEE 802.11 data frames and their payloads.
3 Each distinct MAC address in a frame’s sender field accounts for
a device.
Table 4.1: Quantitative characteristics of the 2008-12-01 and 2008-12-19 datasets.
2008-12-01. Eight sniffers. Traces last roughly one hour and were recorded on December
1st 2008, starting around 3 p.m. All sniffers were located indoors on the same desktop.
2008-12-19. Six sniffers. Traces last roughly two hours and were recorded on December
19th 2008, starting around 11 a.m. In this dataset, due to other constraints, we split
sniffers into three groups of two. All groups are located indoors in the same room, but
each group is at a different spot in the room.
4.2.2 Preliminary analysis
Table 5.1 presents some quantitative characteristics of the datasets. Despite not being very
different in nature, traces display some unexpected differences.
2008-12-19 lasts twice as much as 2008-12-01, but its merged datasets is only one and
a half times bigger. This difference of activity is probably due to the fact that more people are active during the afternoon than around lunch time. Also, 2008-12-19 is close to
Christmas, thus we can expect some regular users to be on vacations at this time.
2008-12-19 has way less data traffic than 2008-12-01.
This confirms the previous point.
Average management traffic rates are the same order of magnitude in both datasets,1 but
2008-12-01 has an higher average data traffic rate (46 kB/s vs. 11 kB/s, all channels cumulated). This is why 2008-12-19 is not twice as big as 2008-12-01. Again, this confirms
2008-12-01 displays more user activity.
1 When
cumulating traffic from all channels on merged datasets, 2008-12-01 has an average rate of 83 kB/s
for management frames, while 2008-12-19 an average rate of 96 kB/s.
Chapter 4. Accuracy of wireless packet sniffing
47
700
400
350
600
300
500
250
400
200
300
150
100
Channel 1
Channel 6
Channel 11
Channel avg.
50
0
200
Channel 1
Channel 6
Channel 11
Channel avg.
100
0
15:00
15:30
16:00
11:00
(a) 2008-12-01
11:30
12:00
12:30
13:00
(b) 2008-12-19
Figure 4.2: Number of MAC addresses each merged trace contains from its beginning to a
given time. Contrary to table 5.1, which only accounts MAC addresses from frames’ sender
fields, all fields containing valid MAC addresses are used.
Also note that non-data traffic (management and control traffic) is unexpectedly high.
Control traffic is negligible (less than 2% of all traffic) therefore this overhead is mostly
management traffic. This is a sign that many networks share the medium, each network
having its own traffic for management.
2008-12-19, which lasts twice as much as 2008-12-01, also has twice as much distinct devices.
This also holds for ESSIDs and access points. This is surprising because one should
expect to discover most of the devices at the beginning of traces and then to have a curve
that increases slowly (especially for ESSIDs or access points). Figure 5.2 presents growth
curves. They effectively appear to be non-linear but sniffers discover a majority of devices
long after the first few minutes and the growth curves are not that flat. Probably datasets do
not last long enough so we can draw more conclusions about that. However, the fact that
one is able to discover new networks after more than one hour is another sign that many
distinct networks share the radio medium.
Despite both datasets exhibit the same small number of ad hoc cells, cell IDs are different in each dataset.
Two cells from 2008-12-19 however share a prefix with a cell from
2008-12-01. We believe these cells relate to temporary or test networks (e.g., mesh test 1,
mesh test 2, or meshtest).
As a summary, both datasets reflects the same environment under different usage conditions. The environment features a high number of networks, almost all of them being
4.3. Completeness evaluation: shortcomings
48
infrastructure networks. Despite a crowded medium, 2008-12-19 displays sensibly less
user activity than 2008-12-01.
4.3 Completeness evaluation: shortcomings
Several issues render completeness evaluation techniques inaccurate. Partly because of their
strategies and partly because of some anomalies that occur in traces. In fact, existing estimation techniques assume strict conformance to the IEEE 802.11 standard for all devices – this is
often not the case, as we will see later in this section.
Analyses of our datasets reveal multiple shortcomings. 2008-12-01’s and 2008-12-19’s
individual traces exhibit unexpected completeness values between 10% and 15% (using a
seqnum-based technique). Merging traces only raises these values by barely more than 1%.
This is far from the expected 90%! Starting with this result, we made several observations.
Estimation techniques assume the network is not congested.
In a congested environment
many frames fail to access the medium. This means that counting gaps in sequence number
reveals transmission failures rather than sniffer losses. Note that the large number of access
points in our traces supports the congestion hypothesis. It also suggests that the hidden
terminal problem is likely to occur in a massive way.
Seqnum-based techniques assume IEEE 802.11 implementations generate correct sequence
numbers. This is wrong in practice because:
1. Some access points wrap their counters at 2,048 instead of 4,096 [53] . How this affects
estimation techniques is implementation-dependent. Possible effects include ignoring
some relevant gaps or detecting invalid gaps with large values.
2. Some access points set their sequence numbers to zero for all frames (we observed this
behavior during other minor experiments).
3. Some access points manage multiple “virtual” access points simultaneously. 2008-12-01
and 2008-12-19 contain several such devices. In the ideal case, each virtual access
point should maintain its own sequence counter (IEEE 802.11 [37 , p. 66, 7.1.3.4.1] ). But
in practice this is not true, which introduces invalid gaps (i.e., leading to underestimating completeness).
At the time of this writing, no automatic technique exists to detect such anomalies. In
particular, we see no straightforward solution for the third anomaly (single counter for multiple virtual access points). Nevertheless, once one detects the faulty stations, it is in theory
possible to work around these anomalies. As an example, we detected that a single device
in 2008-12-01 was responsible for a 5% underestimation of completeness.
Chapter 4. Accuracy of wireless packet sniffing
score(mk ) =
49
min(o1 , o2 )
|m N |
Figure 4.3: “Score” of a single merge operation. m N is the last merge, i.e., the one that
includes N sniffers. Note that when k > 2, mk−1 features frames from at least two distinct
sniffer traces and thus it is expected that o2 > o1 . Therefore in most cases score(mk ) =
Message-type techniques fail to detect series of missing frames.
o1
.
|m N |
For instance, message-
type techniques cannot detect a missing data frame if its corresponding acknowledgement
is also missing. We call a clear gap two consecutive frames from the same station that exhibit
a gap in the sequence number and that are not interleaved with any frame that mentions this
station (either as transmitter or receiver). Clear gaps are symptoms of missing frames that
message-type techniques would not detect. In 2008-12-01 and 2008-12-19, 81% and 89%
of the estimated missed frames are due to clear gaps. In the famous sigcomm2004 traces [50] ,
clear gaps represent 59% of the estimated missed frames. This means that message-type techniques fail to detect most of the missed frames.
As a conclusion, one should use completeness estimation techniques with care. Messagetype techniques are likely inaccurate. Seqnum-based techniques might lead to good results,
provided no congestion and strict IEEE 802.11 conformance of all participating devices. In
any case, uncertainty exists regarding the accuracy of these techniques.
4.4 Completeness and number of sniffers
We now make a step forward and investigate the impact of the number of sniffers on the
completeness of the dataset. To this end, we analyze subsets of our datasets with a varying
number of monitors.
4.4.1 Methodology
The goal is to evaluate the “quality” of a merged dataset with respect to the number of
sniffers that compose it. We combine individual traces in groups of increasing size k, where
k ∈ {2, 3, . . . , N } is the number of traces inside the group. Recall that N = 8 for 2008-12-01
and N = 6 for 2008-12-19.
4.4. Completeness and number of sniffers
50
Figure 4.4: Successive computations of Mk for N = 4. An arrow from x to y symbolizes the
x ? y merge operation.
Let Mk be the set of groups of size k (i.e., merged datasets including traces from k sniffers). To compute Mk , we proceed recursively from Mk−1 . For the sake of simplicity, we
define the binary merge operation a ? b meaning the result of merging datasets a and b. This
operation is theoretically symmetric and associative, and we assume that our trace merging algorithms hold these properties. Let us show an example with N = 4 (see Figure 4.4).
The original traces are { a, b, c, d}. We first compute M2 by merging each trace with each
other (due to symmetry, we skip some operations, e.g., b ? a because we compute a ? b instead). We compute M3 by merging each element of M2 with the remaining traces. Again,
we can skip some operations due to symmetry and associativity (e.g., we skip b ? c ? a because we compute c ? a ? b instead). We keep on performing this procedure until k = N.
Note that computing Mk involves ( Nk ) merge operations. Also note that each merge operation produces a new merged dataset. Therefore, we assimilate in the following each merge
operation with its output.
In order to evaluate the quality of each Mk , we attribute a score to each element of Mk .
We then compute Mk ’s average score. Let mk be the merged dataset one wants to score,
mk−1 the previously merged dataset, and t the new individual trace we want to add to mk−1 .
We have mk = t ? mk−1 . Figure 4.3 depicts how we compute score(mk ). Basically, score(mk )
represents how many frames mk contains that would not have been taken into account if
only mk−1 or t were considered. For better readability, we normalize this quantity with the
Chapter 4. Accuracy of wireless packet sniffing
51
Score (%)
2008-12-01
16
14
12
10
8
6
4
2
0
channel 1
channel 6
channel 11
2
3
4
5
6
7
8
Score (%)
2008-12-19
16
14
12
10
8
6
4
2
0
channel 1
channel 6
channel 11
2
3
4
5
6
7
8
Number of monitors
Figure 4.5: Evolution of scores w.r.t. the number of monitors.
frame count of our largest merged dataset (so a score is a ratio between 0 and 0.5). The larger
the score, the more useful the merge.
4.4.2 Results
Figure 4.5 and Figure 4.6 present the results. For both datasets, we merge each channel
individually. Each cell presents average values for a given set of individual datasets. We
draw the following conclusions.
Scores decrease with size.
This is expected: the bigger the dataset, the less interesting it is
to add new sniffers.
Scores never reach zero.
This is however unexpected: even with eight sniffers, each trace
contains a small percentage of frames that do not exist in the seven others.
Small merges are not that bad.
Merges of size 2 are able to provide a significant portion
of the datasets’ total number of frames (78% and 73% in average for both datasets). This
indicates a large part of individual traces’ frames are shared among sniffers. This is also
visible when looking at the average proportion of shared frames inside M2 . One needs
many sniffers however to obtain a near-complete trace: at least 5 sniffers for sizes above
90%.
52
4.4. Completeness and number of sniffers
Figure 4.6: Scores w.r.t. number of monitors and dataset. Each column represents a given channel of a specific dataset. Each row Mk represents the set of sub-datasets of size k. Each
cell contains a box whose size is proportional to the average number of packets inside the
corresponding sub-datasets. Red (dark) parts of boxes represent average values of o1 (see
Figure 4.3). Pink parts (medium grey) represent average values of o2 . Numbers below boxes
are average scores (in percents) with 95% confidence intervals.
Chapter 4. Accuracy of wireless packet sniffing
53
Individual sniffers display high variability. This translates into wide confidence intervals
on the first row of Figure 4.6. For instance, a sniffer of 2008-12-01 accounts for 53% of all
the dataset’s frames while another accounts for up to 87% (results vary between 45% and
80% for 2008-12-19). Since some of these variations occur with sniffers next to each other,
we conclude that sniffing processes exhibit high randomness.
As a summary, despite most frames are heard by multiple sniffers, a few of them are
difficult to receive. This means that each sniffer’s traces contain most of the dataset’s frames
but also some original frames. Therefore, researchers should use techniques that are robust
to frame losses as they are unavoidable no matter the number of sniffers.
4.5 Conclusion
Our analyses reveal that traditional completeness estimation techniques have several shortcomings, making them unreliable. Even when using eight sniffers on the same desktop,
there exist frames only recorded by one sniffer. This suggests some other frames were left
unrecorded despite we use more sniffers than used in typical settings.
Several extensions are possible to this work. We plan analyzing underloaded environments with more monitors. One could also focus only on networks with good reception.
Finally, it could be interesting to look for other completeness estimation techniques, to differentiate among transmission failures and frame losses.
54
4.5. Conclusion
Chapter 5
Empirical analysis of Wi-Fi activity in
three urban scenarios
A
BILITY
to study arbitrary environments is one of the motivations that led to develop-
ing WiPal. More specifically, we are interested in environments where no network
traces are publicly available. This is why, in this chapter, we record and analyze traces from
three environments with different sociological means: an office, a dense uptown residential
area, and a sparse suburban residential area. Contrary to existing studies, we do not focus
on a single network, but on the overall network activity. We study the behavior of devices
rather than traffic characteristics. We are interested in observations like the total duration
a device is active, the frequency of appearance of new devices, and activity that can be extracted from traces. It is usual that a sniffer faces radio range limitations and high packet
losses; nevertheless, analyzing traces provides important insights into the activity of a given
wireless environment as perceived by the wireless adapters. This work is a joint-work with
Mathias Boc [15] . We carried it out during our respective PhD theses.
Many papers actually use wireless sniffing as a monitoring technique. For instance,
Cheng et al. propose Jigsaw [22] , a large scale monitoring system based on sniffing. However, despite being powerful and scalable, Jigsaw imposes some constraints on monitors
that make it unpractical in a number of environments. Researchers often use sniffing as a
means to diagnose network problems [23] , enhance security [12] , or analyze communication
protocols [43;59] . Nevertheless, as far as we know, authors using sniffing do not study user
behaviors. In fact, some papers analyze the behavior of users with other techniques, but
most of them focus on specific environments. They typically rely on traces collected from a
given network’s logs [13;35;42;56] . In this way, their methods are not applicable when several
independent networks cover the target area, or when it is unfeasible to access the network
infrastructure. It is interresting to note however that some of these papers study largescale networks. Especially, Afanasyev et al. [9] use such a technique on a city-wide network
with several types of users (broadband access, 3G cellular, and commercial). Some papers
55
5.1. Setup
56
rely on giving to volunteers dedicated devices that measure contacts with other devices [36] .
Typically, experiments concern a few dozen protagonists for a few days, which is the main
limitation of this technique. To the best of our knowledge, only González et al. [33] studies
human mobility in a large environment, but they focus on real mobility rather than user
behavior as seen from IEEE 802.11 networks.
5.1 Setup
We perform our analyses on traces collected in three different environments. We obtain each
trace using a sniffer (laptop) equipped with three IEEE 802.11 radio interfaces (ASUS EeePC
700 and Netgear WG111v3, see Figure 4.1 in the previous chapter). The interfaces listen on
channels 1, 6, and 11. Each radio is set up in monitor mode and record every frame it hears
regardless of the network.1 We refer to the three traces as follows:
Office. This is a three-day-long trace recorded in the computer science laboratory of Université Pierre et Marie Curie – Paris 6. The laboratory spans three floors of a twelve-floor
building that is also occupied by some private companies.
Residential, sparse. This is a three-day-long trace recorded in a suburban residential area.
The area is crowded only with small habitation buildings and houses.
Residential, dense. This is a ten-day-long trace recorded uptown. The area is mostly residential but includes shops and schools. Tall towers compose habitation buildings.
There is a high car and pedestrian traffic.
Table 5.1 presents quantitative characteristics of these traces. As expected, the office
trace has the greatest number of devices, ESSIDs,2 and access points (AP). It has more access points than ESSIDs, which means that some wireless networks span multiple access
points. The office trace also contains beacons from a relatively high number of ad hoc networks. This comes mostly from unconfigured devices (e.g., printers and “Free Public WiFi”)
and devices that create a network they expected to find (e.g., “AT&T Wireless”). The same
reasons make the dense residential trace include information on ad hoc networks.
The sparse residential trace, as expected, has the smallest number of devices. It has more
access points than ESSIDs; this is due in part to hidden ESSIDs (5 APs hide their ESSID,
and we expect them not to belong to the same network) and to Internet boxes that advertise
shared ESSIDs belonging to network operators (e.g., for Wi-Fi phone service). This trace
includes however two surprising features:
1 Despite
2 ESSID’s
not available yet, we plan to make these traces public as soon as possible.
are strings used as network identifiers. A single network might include multiple access points, but
has only one ESSID.
Chapter 5. Empirical analysis of Wi-Fi activity in three urban scenarios
Duration
Office
Residential, sparse
Residential, dense
3 days 10h
3 days 12h
10 days 15h
Size1
11.92 GB
3.67 GB
1.61 GB
Data size2
3.43 GB
4.75 MB
290.82 MB
Devices3
856
49
294
ESSID’s
44
9
7
Access points
52
14
4
Ad Hoc cells
82
1
10
57
1 Sizes
only include IEEE 802.11 frames and their payloads.
sizes only include IEEE 802.11 data frames and their payloads.
3 Each distinct MAC address in a frame’s sender field accounts for a device.
2 Data
Table 5.1: Quantitative characteristics of the Office, Residential sparse, and Residential dense
traces.
1. Out of the 3.67 GB that compose the sparse residential trace, only 4.75 MB are data frames!
98.7% of frames in the sparse trace are access point beacons, which suggest these networks exist but are just unused in practice. We believe that they are default-provided
with network operator boxes, but that most people access the Internet using wired
links to their boxes.
2. The sparse residential trace is bigger, has more access points and ESSIDs than the dense residential trace. In fact, the sparse residential trace is bigger because its sniffer has more
networks in its vicinity. This means that access points’ frames account for most of a
trace’s size. This is however surprising that the sparse trace has more networks than
the dense one. This might be due to differences in Wi-Fi signal propagation in each
area (making it easier to hear far networks in a sparse environment) or to social differences in populations composing the neighborhoods.
5.2 Device diversity
This section investigates two sources of device diversity: cumulated activity durations and
growth of the number of devices. The term device refers to any IEEE 802.11 station. This
typically concerns human-operated computers, but also access points and Wi-Fi printers.
The reason we study these two characteristics is twofold. First, we want to investigate who
exactly uses the wireless medium at given periods and locations. Second, device diversity
is relatively easy to compute even in the presence of huge frame losses. This is important
because we record each trace using wireless sniffing in areas that are unfriendly to this technique (due to interferences and the presence of multiple walls). In this regard, the sparse
residential trace is the worst: by looking at frame sequence numbers, we observe that the
5.2. Device diversity
58
Channel 1
Channel 1
1 day
1 day
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
Channel 6
1 week
1 day
1h
15 min
3 min
0
20
0
15
1 day
1 day
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
0
25
0
20
0
15
(b) Residential, sparse
0
10
Devices (sorted by total activity duration)
50
0
40
35
30
25
20
15
10
5
0
0
60
0
50
0
40
0
30
0
20
0
10
0
(a) Office
0
Channel 11
1 week
1 day
Devices (sorted by total activity duration)
10
Channel 11
50
0
Total activity duration
1 day
30
25
20
15
10
5
0
0
0
50
0
40
0
30
20
0
10
0
Total activity duration
Channel 6
Channel 11
0
14
0
12
0
10
80
60
40
20
0
25
20
15
1h
15 min
3 min
10
1 day
5
Channel 6
0
0
400
350
300
250
200
150
10
50
0
Total activity duration
Channel 1
1 week
1 day
Devices (sorted by total activity duration)
(c) Residential, dense
Figure 5.1: Distributions of cumulated activity durations.
trace lacks 85% of the frames. We estimate that the office trace has a missing frame ratio of
70%, and the dense residential area trace a missing frame ratio of only 4%. This small value
is due the fact the trace features a very small number of active networks, which means that
the sniffer ensured very good reception for the predominant network. We first analyze the
distribution of cumulated activity durations and then the growth of the number of devices.
5.2.1 Cumulated activity durations
Figure 5.1 plots the distribution of cumulated activity durations among all traces and channels. Each impulse maps a single device to the total duration of its activity inside the trace.
We consider that a device is active when it emits a frame within a window of three minutes
(any type of frame: management, data, or control). We use the thee-minute threshold because access point drivers use activity timers with similar values (e.g., MadWifi drivers use
timers varying from 30s to 5min). Requiring one frame within a window of a few minutes
makes the technique resilient to frame losses.
A few features are common to all traces:
Devices are unevenly distributed among channels.
In all traces, more devices appear on
channel 11 than on channel 6, and more devices appear on channel 6 than on channel 1. This
is a direct consequence of networks being unevenly distributed among channels (both in ad
hoc and infrastructure modes).
Chapter 5. Empirical analysis of Wi-Fi activity in three urban scenarios
Device activity has a highly uneven distribution
59
for a given trace and channel (note the
logarithmic scale on Figure 5.1). We can classify devices in three groups: (1) devices that are
(almost) always active, (2) devices that appear only once, and (3) other devices. Among all
traces, a sum of 31 devices (out of 2,395) belong to class (1).3 27 of these devices seem to be
access points. Two of the four remaining devices appear in the office trace, and two of them
in the dense residential trace. The remaining devices emit no beacons, so they are not in
ad hoc mode. It is interesting to underline that a handful of users always leave their devices on.
A significant portion of devices belong to class (2) (20% in the office and dense residential
traces, 9% in the sparse residential trace). This means that many users are not regular and
just pass by. Class (3) is diverse and includes the whole range of possible duration values.
However, the smaller the duration, the higher the probability.
Most devices are nearly inactive. Varying with the trace and channel, 48% to 96% (76%
on average) of the devices are active for less than one hour during the whole duration of
measurements. Therefore, a majority of devices are inactive most of the time.
There are points however where traces are different and include specific features. The
office and sparse residential traces have similar shapes, but devices in the latter tend to
cumulate longer activity durations. The dense residential trace has a shape that include a
visible cut between very active and nearly inactive devices. These variations are noticeable
through the average durations: 2h36min for the office trace, 11h48min for the sparse residential trace, and 2h21min for the dense residential trace (keep in mind this trace is three
time as long as the two others). Therefore in some environments, in average, devices tend to be
active longer.
5.2.2 Growth of the number of devices
Figure 5.2 plots the growth of the number of devices. Each curve corresponds to a given
trace and channel (plus a curve for each trace that represents the average of the three channels). Each point shows the number of devices a given [trace, channel] pair features from its
beginning up to the corresponding timestamp. We consider that each MAC address represents a device, and look for MAC addresses in all address fields of the frames. Some devices
are mentioned as destinations but never as transmitters. That explains why we discover
more devices than indicated in Table 5.1 and Figure 5.1. Furthermore, due to a subtlety in
the IEEE 802.11 protocol, some address fields of the frames may contain values that are actually not real MAC addresses (e.g., independent BSSIDs). We ignore these values. We can
derive a number of interesting observations from Figure 5.2:
The repartition of devices among channels is uneven.
Furthermore, it does not always
correlate with the repartition of sending devices among channels. In all traces, less devices
3A
device that appears on multiple channels is accounted multiple times.
5.2. Device diversity
60
2500
90
Channel 1
Channel 6
Channel 11
Channel avg.
2000
600
Channel 1
Channel 6
Channel 11
Channel avg.
80
70
500
60
1500
Channel 1
Channel 6
Channel 11
Channel avg.
400
50
300
40
1000
30
200
20
500
100
10
0
0
06
12
y
da 11
y
da 10
y
da 09
y
da 08
y
da 07
y
da 06
y
da 05
y
da 04
y
da 03
y
da 02
y
da 01
y
da
05
04
03
02
01
05
04
03
02
01
(b) Residential, sparse
y
da
y
da
y
da
y
da
y
da
y
da
y
da
y
da
y
da
y
da
y
da
(a) Office
0
(c) Residential, dense
Figure 5.2: Number of distinct MAC addresses each trace contains from its beginning to a
given time.
appear on channel 1 than on any other channel. This is perfectly consistent with previous
results (cf. Section 5.2.1). Nevertheless, channel 6 attracts more users than channel 11 in two
of the three traces. This contradicts the channel repartition of Figure 5.1. The difference is
that Figure 5.1 only considers devices that emit frames while Figure 5.2 considers all types
of devices. This indicates that it is difficult to evaluate the repartition of users among certain
channels.
The discovery rate follows a day-night pattern.
Curves periodically alternate between
flat and growing periods. Depending on the trace, this effect has varying amplitudes and
periods, but is visible in all traces. Flat periods occur during nights, usually starting around
midnight and stopping a few hours before noon. This shows that, as expected, devices’
activity correlates with human activity.
In the office and dense residential area, the discovery rate is constant during long periods.
Furthermore, in the dense residential trace, this still holds after a week of measurement. On
the other hand, the sparse residential trace flattens drastically after two days. We believe
that this is a consequence of the type of environment: high mobility is expected in uptown
streets and offices, as well as a high turnover of people. We can expect many new users
will not come back before the measurement ends. Therefore this also explains why the
average activity duration per user is higher in the sparse residential trace (see the end of
Section 5.2.1). Note that, however, even when the discovery rate falls after two days, it is still
possible to discover new users near the end of the trace.
Among the different observations derived in this section, we believe that two of them are
of particular importance. First, as shown by the study of activity durations, users are mobile
Chapter 5. Empirical analysis of Wi-Fi activity in three urban scenarios
10
-1
P[Inter-activity > t]
14h34h
0
0
10
10
61
0
10
-1
-1
10
10
-2
10
-2
-2
10
10
-3
10
-3
-4
-5
10
-3
10
Distribution
-α -βt/k
(t + t0) e
10
10
Distribution
-α -βt/k
(t + t0) e
-4
1min
1h
Time t (seconds)
24h 72h
10
(a) Office
Distribution
-α -βt/k
(t + t0) e
-4
1min
12mins
Time t (seconds)
24h 72h
(b) Residential, sparse
10
2h
24h 1 week
Time t (seconds)
(c) Residential, dense
Figure 5.3: CCDFs of aggregated inter-activity times of all devices for the three traces. The
distributions are well fitted by truncated power laws with exponential decays. The parameters of the distributions are presented in the text.
or do not generally keep their Wi-Fi equipments switched on. This translates into packet
traces where most devices are inactive most of the time. Second, different environments
have different impact on mobility. This translates either into new user apparitions being
evenly spread inside traces or, on the contrary, grouped at the beginning of traces.
5.3 Activity/Mobility Behaviors
This section analyzes the type of relationship devices develop with their environments. Behind the notion of relationship, we are interested in understanding how device activity
evolves. We highlight predominant patterns when they exist with the objective of characterizing the importance of locations on the behaviors of the devices.
Because of our centered vision of space (each time we only use one sniffer), it is difficult
to extract physical mobility behaviors from traces. In some situations, however, temporal activity patterns give insight on devices’ mobility: either a device is no more in the considered
space or it is back and active. Statistical tools exist to extract mobility patterns information.
We take advantage of them and rely on the available activity information.
5.3.1 Inter-activity patterns
In this first part, we analyze the devices’ rhythm of activity. For this purpose, we represent the aggregated complementary cumulative distribution function (CCDF) of the interactivity times (see Figure 5.3). The inter-activity time is the time gap between the beginnings
of two consecutive periods of activity. Therefore, the duration of activity is included in the
inter-activity time. We start by presenting the distribution parameters and, for each trace,
5.3. Activity/Mobility Behaviors
62
we investigate the meaning of variations when they exist. Note that only devices that are
active at least twice are represented here.
We can approximate the CCDFs of the three traces by truncated power laws with exponential decays:
P(t) = (t + t0 )−α exp(− βt/k),
(5.1)
For the office trace t0 = 1 minute, α = 0.40, β = 1.2, and k = 24 hours (Figure 5.3(a)). The
parameters for the residential sparse trace are: t0 = 1 minute, α = 0.45, β = 1.40, and k = 24
hours (Figure 5.3(b)). For the residential dense trace t0 = 15 minutes, α = 0.40, β = 0.8, and
k = 24 hours (Figure 5.3(c)). The power law part of the distributions shows a slope that is
very similar to recent experimental results found in the literature [19;41] . It counts for a large
proportion of the inter-activity times: ≈ 98.3% for the office trace, ≈ 99.2% for the residential
sparse, and ≈ 92% for the residential dense. For the three distributions, k is almost the same
which can point out on a possible cycle or period of one day (the characteristic time in [41] ).
The value β is around 1.3 for the first two traces which indicates a strong contraction of the
probabilities of activity after 24 hours. For the residential dense trace, this value is lower
(0.8) which here indicates a greater disparity of the probabilities. Partly due to the longer
duration of the trace, it is important to note that there are no strong variations and thus no
coordinated behaviors among the devices. Finally, we can note that the parameter values
are very similar for the office and residential sparse distributions which indicate that these
locations might have the same level of influence on devices behaviors (constraint, necessity,
social habits).
Concerning the variations in the distributions, we observe three main steps in the distribution of the office trace: the first around 1 hour, a second around 24 hours, and the third
around 48 hours. According to the characteristic time k equal to 24 hours, we can suppose a
periodicity of one day for a large part of the devices and, one of 48 hours for a smaller part.
The first variation around 1 hour is difficult to interpret because of its small length but may
have a link with different pauses in the activity along their presence in the environment.
The residential sparse distribution presents four main steps: the first around 12 minutes,
the second around 14 hours, the third around 24 hours and the fourth around 34 hours. The
first variation around 12 minutes is particularly interesting. After verification in the trace,
this duration corresponds to a handheld mobile device that is programmed to check mails
every 15 minutes (the observed 12 minutes plus the 3 minutes of activity duration granularity). Of the three other variations, the one around 24 hours concentrates a greater proportion
of probability. With a characteristic time of 24 hours, it points out a daily periodicity. Following the same logic, the two other variations also point out a strong periodicity of 24 hours
but time-shifted by 14 hours and a periodicity of 34 hours that collect less devices than the
24-hour period.
Compared to the other traces, the residential dense distribution does not present clear
wide variations. Although the characteristic time is also around 24 hours, there is no vari-
Chapter 5. Empirical analysis of Wi-Fi activity in three urban scenarios
0
24
48
72
10
24 34
48 58
72
0
24
48
63
72
96 120 144 168
0.3
Proportion of Active users
0.14
0.14
0.12
0.25
0.1
0.2
0.1
0.15
0.08
0.08
0.12
0.06
0.06
0.1
0.04
0.04
0.05
0.02
0
0.02
0
0
Time t (hours)
(a) Office
Time t (hours)
Time t (hours)
(b) Residential, sparse
(c) Residential, dense
Figure 5.4: Proportion of users that are active each time interval relatively to the first time
(interval) they appeared for the three traces. In these traces, we observe a clear periodicity of 24 hours with some variations that are characteristic of the social meaning of each
environment.
ation around this value. In this situation it is difficult to judge if really there are no coordinated behaviors among devices.
To confirm the observations of periodicity, we analyze, in the following, from a different
point of view the device activity behaviors.
5.3.2 Predominant activity pattern
With a long-term scope, we now investigate and extract, if it exists, the predominant pattern
that defines each context with the properties that characterize it. There are different means
to address this issue. Our approach is to simply consider the activity of each device by
slicing the observation period in time intervals of equal durations and by aggregating the
activity patterns in each of these time intervals. More specifically, for each device we mark
the set of time intervals where it has been active relatively to the first time interval when
it appeared in the environment (which is set to 0). For each time interval, we compute the
number of devices that were active to obtain the different proportions. With this method, the
proportion of active devices at the time interval x indicates that a certain number of devices
has been active x time interval(s) after the first time they have been seen for the first time.
Therefore, a peak at each kx (with k > 0) could point out on a possible coordinated activity
and periodicity of behaviors. Here, we set time intervals of 1 hour and plot the results in
Figure 5.4.
We start by analyzing the results obtained for the office trace (Figure 5.4(a)). As we can
observe, the figure presents clear peaks each 24 hours, which indicates daily periodicity in
the behaviors. The decrease in the proportions is due to the new devices that have not been
active during the whole period of activity. Therefore, their activities are mostly visible in
the first part of the figure. The second observation is that around the peaks, the proportions
5.4. Conclusion
64
remain high during a period of about 8 hours and decrease abruptly. Hence, there is a real
coordinated movement of a large proportion of the observed population in this context. The
office constraints and schedules can explain this phenomenon and then, this predominant
behavior can be judged as representative for this type of environment as a large parts of the
population are workers.
Contrary to the office trace, the residential sparse one presents a different pattern with
interesting properties (Figure 5.4(b)). If we start with a periodicity of 24 hours, the pattern presents peaks every 24 hours, which confirms this (expected) periodicity. However,
we also observe another period of 24 hours but time-shifted by 10 hours from the start.
To summarize, globally, devices are active 10 hours after the first time of activity and 14
hours after with a periodicity of 24 hours. This phenomenon might have different means.
The most related to the residential environment is the diurnal activity where an office-like
pattern is subtracted. Devices are active early morning and early night. In this situation,
the gap of 10 hours corresponds to night periods when devices are not active and the gap
of 14 hours, when devices are away from home. Therefore, there is a real complementary
link between the two environments. Compared with what we know from the networking
literature, where most of mobility/activity behaviors come from university campuses, the
activity pattern we observe here is clearly new and different. However, if we are able to extract a predominant pattern from this residential (sparse) environment, we have a different
pattern for the residential (dense) context.
As mentioned in Section 5.1, the residential dense trace has been obtained uptown while
the residential sparse one is suburban. In a suburban residential environment, the proportion of observed devices that may have a relationship with the environment is important
because of the high proportion of homes. Uptown, the presence of shops, schools, and
other concentration points may introduce a proportion of devices that do not have any relationship with the considered location. With these elements in mind, we observe that it
is difficult to extract a predominant pattern from the results of the residential dense trace
(Figure 5.4(c)). If, as the residential sparse trace, we consider, a priori, a periodicity of 24
hours, there are peaks that confirm this periodicity but with the same proportion than other
that occur irregularly. In this situation, a classification of the devices could be interesting
to better understand the different relationships that exist in this environment. Although we
let this study as future work, we should be able to detect and analyze yet unseen population category of householders, and more traditional ones such as workers, commuters, and
visitors.
5.4 Conclusion
This chapter analyzes behaviors of Wi-Fi users in three different locations that have distinct
social meanings. With our sniffing technique, we are able to provide a more complete view
Chapter 5. Empirical analysis of Wi-Fi activity in three urban scenarios
65
Figure 5.5: Sniffer locations regarding the collection of traces inside the Parc Monceau. The
subsequent trace analysis is currently in progress. (Background from Google Maps.)
of the population moving in a given location and highlight important aspects of what can
be found in real situations. In particular, we notice that: (i) in popular places, the rate of
discovered users can increase almost linearly within the window of observation, (ii) regular
users count for a very small portion of the total population, (iii) user activity highly varies
from scenario to scenario, and (iv) the location plays a role on the presence duration.
Related to these aspects, our study also leverages open issues as how to distinguish the
population for which the considered location has a social meaning and how the device can
understand in what kind of environment it is currently in.
In order to extend this study, we are currently analyzing traces from multiple monitors
we collected in a Parisian park, the Parc Monceau. This park’s Wi-Fi activity interests us
because it includes several access points spread at various locations. We used ten monitors
and measured an area about half the park wide, during one hour (see Figure 5.5). Our
analyses are in progress therefore we only have few results for the moment. Traces include
138 emitting devices, 71 of which are Apple devices. We believe these are mostly mobile
devices (iPhone or iPod touch). With such a number of mobile devices, it is possible that
traces reveal unseen usage patterns.
66
5.4. Conclusion
Chapter 6
Conclusion and future work
W
IRELESS
sniffing is a powerful technique to measure activity in Wi-Fi networks but
suffers from a number of issues. These are both pragmatic and theoretical. First,
existing software to handle IEEE 802.11 packet traces is not satisfying. In general, available
software has not been designed for reusability. Thus, developing new tools requires starting
from scratch. There is also a lack of efficient and flexible merging tools. Second, several
issues exist regarding the relevance of wireless packet traces. Wi-Fi sniffers inherently miss
some frames and therefore it is essential to evaluate the number of missed frames (i.e., the
completeness of traces). Most studies involving wireless sniffing do not focus on Wi-Fi usage
patterns. Other studies use it only in specific environments such as laboratories, campuses,
or conferences.
This thesis addressed the aforementioned issues. We first develop WiPal, a framework
to help process IEEE 802.11 packet traces. WiPal includes a flexible trace merger. Through
the analysis of two short-lived traces, we studied the accuracy of completeness evaluation
techniques, and the impact of adding new sniffers on trace completeness. A final study
collected and exploited three long-lived datasets in different environments to study Wi-Fi
usage patterns.
6.1 WiPal
WiPal’s design includes several software patterns that are relevant to packet trace processing. Since packet traces are basically streams of packets, using a pipe and filter pattern
enables users to have a modular approach of trace processing. This allows for an easy
parametrization and maintenance of existing algorithms. Many algorithms also need to access specific fields of IEEE 802.11 frames and thus need to embed a parser. WiPal provides
a solution that uses static callback functions to combine performance and reusability. WiPal
also includes original features that cannot be found in other tools. Among them are random
access to a packet trace and trace aggregation. Evaluation shows that most of WiPal’s fea-
67
6.2. Wi-Fi sniffing accuracy
68
tures have marginal costs on its performance, and thus WiPal does not trade performance for
reusability. Some of WiPal’s utilities run faster than other state-of-the-art programs. WiPal
features a library and tools to carry various miscellaneous operations (such as comparison,
concatenation, or hexadecimal dumping), statistics extraction, or anonymization.
WiPal also includes an innovative offline trace merger. This merger includes original
algorithms with regard to reference frame extraction and trace synchronization. A study
shows its synchronization algorithm offers better performance and better accuracy than previous algorithms. WiPal’s merger supports more input formats than any other Wi-Fi packet
trace merger. Contrary to other tools, using it is straightforward and does not require setting
up database backends or time servers. A performance evaluation also shows it is an order
of magnitude faster than Wit, the other offline trace merger.
6.2 Wi-Fi sniffing accuracy
In order to gain further insight into the completeness one can expect from Wi-Fi sniffing, we
collected two short-lived datasets involving six and eight sniffers. Possibly due to congestion, these datasets exhibit a lower completeness than expected. A careful analysis reveals,
however, that the existing evaluation techniques suffer from a number of issues. First, techniques based on analyzing message types are not accurate. Second, some Wi-Fi devices do
not conform with the IEEE 802.11 standard and might skew the results of techniques based
on analyzing sequence numbers. Finally, all the existing techniques are not accurate when
the network is congested.
We then go further into our analyses and study the impact of the number of sniffers on
trace accuracy. To this end, we vary the number of sniffers we use from a given dataset
(starting with a single trace and then adding traces one after another until we use all the
traces from the dataset). We find that, despite most frames are heard by multiple sniffers, a
few of them are difficult to receive. In other words, each sniffer’s traces contain most of the
dataset’s frames (between 45% and 87% in our traces) but also some original frames. This
is true even when using eight sniffers sharing the same location. We argue that researchers
should use analysis techniques that are robust to frame losses.
6.3 Wi-Fi activity
In a last study, we deploy Wi-Fi sniffers in three distinct environments with different sociological meanings: an office space, a sparse suburban residential area, and an dense uptown
residential area. Inside these environments, we focus on Wi-Fi usage patterns. We focus on
the whole traffic rather than a single network. The traces we collect last three days, except
for the dense residential trace that lasts ten days.
Chapter 6. Conclusion and future work
69
All the three traces exhibit a number of differences. Among the residential traces, the
biggest one carries almost no data traffic but includes more than ten distinct access points.
The other residential trace, on the other hand, only mentions four access points but has a significant part of its traffic dedicated to data. This reveals that access point frames (and most
notably management frames) account for most of a trace’s size. Also, some environments
include networks that are configured but not used. Another interesting feature is device discovery. While the office and the dense residential traces display an ever-increasing number
of discovered devices, the sparse residential trace flattens after two days. This reveals that
some environments display users with higher mobility and a high turnover, but this behavior is not universal. Environments also display complementarity among each other. While
the office space have one peak of activity per day, the residential environments display two
peaks spaced by ten hours. This reflects how people use their Internet connection before
and after going to work.
Finally, the traces share a number of features. In all environments devices are unevenly
distributed among channels. Channel 11 always includes more devices than channel 6, and
channel 6 always includes more devices than channel 1. Datasets also feature day-night
patterns (but this rather expected, as device activity reflects human activity). Also, inside
every trace, only a very small portion of devices appear regularly. Finally, all traces include
ad hoc cells.
6.4 Perspectives
As WiPal is a framework to help developing new tools, several perspectives exist regarding
its extension. A first natural step would be to implement new protocols and filters in order
to obtain a tool with a more general purpose than IEEE 802.11 traces. WiPal already includes
a few features regarding Ethernet, IPv4, and IPv6, but these are not at the same level as its
handling of IEEE 802.11. Merging is also available as an experimental feature for some IP
packet traces. Such generalizations are good proofs that WiPal’s design is not specific to
Wi-Fi and suits any protocol. It would also be of interest to exploit the modular nature of
WiPal, and develop multiple implementations of some features. For instance WiPal could
include multiple synchronization algorithms or multiple anonymization functions. Because
WiPal uses static C++ techniques, several of its features require writing cumbersome code.
One should also carry research with this regard to make these features nicer to use. Another issue with WiPal is that its large code base makes it difficult to check for correctness
of operation. Despite most algorithms WiPal implements are simple, the C++ techniques it
uses and the number of interactions involved make the code difficult to check. As a consequence, many test-cases were developed to this end. It would be interesting however to
study if WiPal could be formally proved. At the end, WiPal could become a generic frame-
6.4. Perspectives
70
work for handling packet traces, including algorithms at every level: from packet traces
input/output to trace algorithms, and including parsers for a number of protocols.
On the accuracy of wireless sniffing, our analysis raises a number of questions. First, we
are not sure congestion is the source of the poor completeness in our traces. Furthermore, it
would be unexpected that the CSMA mechanism of IEEE 802.11 generates such significant
losses.1 Further controlled experiments with this regard is desirable. Maybe these losses are
due to our setting (sniffers close to each other, with less-than-average capability, or even the
specific traffic characteristics of the environment). It would be interesting to develop more
experiments with different settings (e.g., different network adaptors, focusing on a single
channel, or reducing interferences) This could also give more insights into how the number
of sniffers impacts accuracy. With this regard, maybe a good thing to do would be to investigate why sniffers exhibit such a variable accuracy. We also show that the existing completeness evaluation techniques have some weaknesses. Although some of them probably
cannot be worked around (it might be impossible to distinguish between sniffer losses and
transmission failures), it would be interesting to develop techniques that fix the others (e.g.,
automatically detecting some non-conform behaviors regarding sequence numbers). Active
experiments could also be of interest to evaluate how inaccurate each evaluation technique
is.
Our analysis of Wi-Fi activity in urban environments also raises questions. First, it is
possible that wireless sniffing introduces a bias in our results. For instance devices located
far from sniffers are likely to be seen less often than near devices, and it is unclear how
this impacts their calculated active duration (even though we tried to mitigate such a phenomenon using an activity period of three minutes). At this stage, it is also unclear how the
information we discovered could be of use to others (e.g., researchers, application designers, software engineers, or hardware vendors). The uneven repartition of devices among
channels probably argues for commodity access points to include algorithms to dynamically switch among channels. Maybe the periodicity in device behaviors could be of some
use to people designing opportunistic networking schemes. Another concern is the generality of our results. Since the tools now exist to perform wireless sniffing in any environment,
it would be of interest to perform more experiments, both in similar urban environments
but also in others. To this regard we collected some traces in a Wi-Fi enabled park.
1 CSMA
(Carrier Sense Multiple Access), is a type of MAC scheme used by IEEE 802.11.
Appendices
71
Annexe A
Résumé de la thèse en français
L
E
standard IEEE 802.11 [37] définit des couches de base pour des communications sans
fils. Il est apparu il y a environ une dizaine d’années, sous la marque Wi-Fi, et il est
largement utilisé aujourd’hui. Les ordinateurs personnels qui effectuent des communications Internet sur des liens radio utilisent quasiment exclusivement ce protocole. Wi-Fi joue
également un rôle majeur dans beaucoup d’équipements mobiles : on le trouve dans des
PDA, des téléphones, des baladeurs, même dans certains appareils photo.
En conséquence, Wi-Fi fait parti du paysage de l’informatique ubiquitaire [58] . Avec l’aide
d’autres protocoles comme Bluetooth ou GSM, on l’utilise pour créer un environnement
numérique transparent, intégré à notre vie quotidienne. Par exemple, des points d’accès
Wi-Fi (hotspots) équipent les foyers, les hôtels, les salles de conférences, ainsi que bien d’autres
lieux. C’est pourquoi il est essentiel de comprendre comment les implémentations du standard IEEE 802.11 se comportent “sur le terrain”. Cette connaissance est nécessaire pour
développer de nouvelles applications et de nouveaux protocoles, ou pour améliorer ceux
qui existent.
A.1 Contexte
IEEE 802.11 spécifie une couche physique (PHY) et des règles d’accès au médium (MAC1 )
pour un réseau sans fils. La PHY est en charge de coder et de décoder l’information sous
forme numérique (des séquences de bits) vers et depuis un signal radio. La MAC, d’un
autre coté, coordonne les transmissions de sorte à ce que chaque station puisse partager le
médium sans interférer avec les autres.
Bien qu’il s’agisse principalement d’un standard poussé par les entreprises, les chercheurs ont produit une grande quantité de travaux au sujet de IEEE 802.11. Cela inclut
des sujets très spécialisés, qui se concentrent par exemple sur la PHY [30;45] , la MAC [46] , ou
d’autres fonctionnalités comme par exemple la sécurité [12;14] . Mais d’autres sujets de re1 MAC
signifie Media Access Control.
73
A.1. Contexte
74
F IGURE A.1 – Sniffing sans fils : des moniteurs passifs écoutent l’activité radio au sein de la
zone de mesure.
cherche plus généraux impliquent ce protocole : les réseaux ad hoc et les réseaux mesh [10;27] ,
les réseaux de capteurs [60] , ou encore l’informatique ubiquitaire [58] . Bien comprendre le
Wi-Fi bénéficie donc à tous ces domaines. Pour atteindre cette compréhension, des analyses théoriques aussi bien que des études expérimentales sont nécessaires. Cette thèse se
concentre sur l’aspect expérimental, et en particulier sur les mesures de terrain des réseaux
sans fils.
A.1.1 Mesures passives Wi-Fi et sniffing
Chaque technique de mesure d’un réseau est soit active soit passive. Les mesures actives
modifient le trafic réseau de sorte à évaluer certains paramètres. Des techniques actives classiques consistent par exemple à saturer un lien pour évaluer sa capacité, ou à envoyer des
sondes pour évaluer les délais aller-retour. À l’opposé, les mesures passives n’interfèrent,
pas avec le trafic réseau. C’est le cas, par exemple, lorsque l’on écoute sur un lien pour analyser son trafic. Les techniques passives peuvent toutefois interférer avec l’infrastructure :
elle peuvent nécessiter des utilisateurs d’installer un logiciel spécifique, ou des administrateurs de brancher des équipements d’écoute particuliers.
Une technique passive classique pour mesurer des réseaux sans fils est le sniffing. Cela
consiste à répartir des moniteurs au sein de la zone de mesure pour qu’ils capturent tout
le trafic qu’ils pourront entendre (voir Figure A.1). Les moniteurs produisent des traces qui
sont des successions de paquets MAC (des trames). Le sniffing est une étape fondamentale
dans un certain nombre d’opérations réseaux, comme par exemple le diagnostique [23;34] ,
l’étude de la sécurité [12;48] , et l’analyse des comportements des protocoles [22;39;43;59] . Bien
que cela ne soit pas obligatoire, il peut aussi servir de support à des systèmes de localisation [20;21;61] . Il existe beaucoup de configurations de sniffing différentes : il peut y avoir
un seul ou plusieurs moniteurs, ceux-ci peuvent être constitués de matériel courant ou
Annexe A. Résumé de la thèse en français
75
spécialisé, et ils peuvent fonctionner d’une manière isolée ou en étant relié à une infrastructure filaire (entre autres paramètres). En revanche, dans tous les cas, l’opération de mesure
est passive, non-intrusive, et n’interfère pas avec l’opération normale du réseau.
Le sniffing sans fils utilise souvent une procédure centralisée qui permet de fusionner
les traces [22;43;59] . L’objectif est d’abord d’avoir une vision globale de l’activité radio à partir
de plusieurs mesures locales. En utilisant des moniteurs avec des zones de couvertures qui
se chevauchent, il est également possible de compenser les pertes de certains moniteurs
en utilisant des données d’autres moniteurs. Mais cette fusion est une tâche difficile ; elle
nécessite une synchronisation très précise des traces (de quelques microsecondes) et une
prise en compte de la nature peu fiable du canal radio (les pertes de trames sont inévitables).
A.1.2 Questions ouvertes
Le sniffing soulève néanmoins un certain nombre de questions ouvertes. Dans cette thèse,
nous nous concentrons sur les aspects de technique informatique2 . Nous les classons dans
deux catégories : les questions au sujet de la technique en elle même, et les questions au sujet
des outils. Cette thèse se préoccupe des deux, dans un effort pour collecter de nouveaux jeux
de données et produire des analyses originales.
Les questions au sujet de la technique sont relative à la pertinence des traces produites.
Par exemple, au sujet de la précision des moniteurs. Même dans de bonnes conditions radio, ceux-ci peuvent rater des trames qui ont pourtant été transmises avec succès. Dans
ce contexte, il est une question naturelle : puisque les traces de chaque moniteur sont incomplètes (c’est à dire que certaines trames ont été perdues) il est probable que la fusion de
ces traces soient également incomplète. Quelle précision est-on en droit d’attendre d’un moniteur ? De plusieurs moniteurs ? Quels résultats peuvent être tirés de traces incomplètes ?
Une autre question concerne la pertinence des jeux de données disponibles. Alors que le
Wi-Fi est presque omnipotent, la plupart des jeux de données rendus publics par les chercheurs concernent des campus d’universités, des laboratoires, des lieux de conférences [2] .
C’est en partie parce que la pratique courante est de se concentrer sur des environnements
facile d’accès pour des chercheurs, mais aussi parce que les techniques de mesure qui existent
ne marchent que dans certains scénarios. La plupart de ces techniques se concentrent sur
un réseau unique, ou bien nécessitent de mettre en place une infrastructure complète, ou
bien sont intrusives vis à vis des équipements réseaux. Lorsque l’on se retrouve dans la
rue ou dans la maison d’un particulier, ces techniques sont donc difficiles à mettre ne pratique. Pourtant, le sniffing sans fils a un très fort potentiel pour mesurer n’importe quel type
d’environnement : il est passif, il n’interfère pas avec l’infrastructure, et dans certains cas
il ne nécessite pas de mettre en place une infrastructure. Mais ce potentiel est resté inexploité jusqu’à présent. En conséquence, les chercheurs se concentrent sur l’étude d’anoma2 Certains
aspects ne sont pas directement informatiques. Par exemple, le sniffing soulève des questions
d’ordre juridique et éthique.
A.2. Contributions de cette thèse
76
lies et de certaines spécificités du protocole [22;39;43] . Nous pensons au contraire qu’il est plus
intéressant d’utiliser le sniffing comme une technique pour étudier les usages du réseau
dans des environnements difficiles d’accès (par exemple des maisons, des rues, ou encore
des parcs).
Les questions au sujet des outils sont relatives à la manipulation des traces de paquets.
En réseau, beaucoup d’opérations mettent en jeu ce type de traces : les administrateurs les
utilisent pour le suivi et le débogage, les chercheurs pour les mesures, la simulation, ou
la validation. Les moniteurs sans fils produisent des traces de paquets, qui sont en fait des
listes de trames MAC. Beaucoup d’outils existent pour créer ces traces et les manipuler, mais
la plupart d’entre eux sont très spécifiques, et utilisent du code difficile à généraliser. Par
exemple, tcpdump [8] est capable de décoder énormément de protocoles distincts, mais son
code de traitement ne peut s’utiliser que pour afficher des paquets dans un terminal. Wireshark [6] est plus modulaire, mais reste dans l’ensemble orienté visualisation, et donc souffre
de problèmes similaires. La plupart des programmes qui traitent des paquets réseaux sont
bien conçus, et apparaissent efficaces vis à vis de leurs objectifs. Mais chaque fois qu’il faut
créer un nouveau logiciel de traitement des traces de paquets, il n’est pas pratique de se reposer sur du code existant. De plus, certains outils souffrent de problèmes de performance
(par exemple, Scapy [5] est un outil très puissant pour l’analyse de traces, mais il n’est pas
utilisable sur de grosses traces – 1 GB ou plus). Tout cela fait que produire des analyses
personnalisées sur des traces de paquets est fastidieux. Cela requiert généralement de programmer de nouveaux outils à partir de rien.
Pour l’ensemble de ces raisons, fusionner des traces IEEE 802.11 pose également un
problème. On trouve dans la littérature quelques outils à cette fin, mais la plupart reposent
sur l’existence d’une infrastructure filaire [22;28] . Les autres sont trop spécifiques à l’expérience
pour laquelle ils ont été conçus [43;44] . Afin de pouvoir généraliser le sniffing de réseaux Wi-Fi
dans n’importe quel environnement, il faut à la fois des outils génériques, et des outils qui
ne nécessitent pas l’utilisation d’une infrastructure filaire.
A.2 Contributions de cette thèse
Les contributions de cette thèse sont doubles. D’une part, nous développons une boı̂te à outil
logicielle, nommée WiPal, pour aider à la manipulation des traces de paquets IEEE 802.11.
Cet ensemble inclut une bibliothèque générique pour le développement de nouveaux outils,
et plusieurs utilitaires directement utilisables pour effectuer des opérations prédéfinies sur
les traces. WiPal possède notamment un outil de fusion de traces innovant. D’autre part,
nous utilisons ces outils pour produire deux analyses. Celles-ci utilisent plusieurs jeux de
données que nous avons collectés dans différents environnements, dont notamment des
traces de plusieurs jours dans des zones résidentielles de banlieue et en centre ville. La
Annexe A. Résumé de la thèse en français
77
première analyse se concentre sur l’étude de la précision du sniffing Wi-Fi. La seconde se
concentre sur les usages du Wi-Fi dans ces différents environnements.
A.2.1 WiPal : manipulation de traces IEEE 802.11
WiPal est notre ensemble logiciel pour manipuler de traces de paquets. On peut le télécharger
librement à l’adresse http://wipal.lip6.fr/. Il est conçu pour la performance, de manière
générique, dans l’espoir qu’il pourra être utilisé par d’autres pour le développements de
nouveaux logiciels, plutôt que pour servir de support à un logiciel spécifique. Bien qu’il
se concentre sur le protocole IEEE 802.11, il fournit plusieurs fonctionnalités indépendantes
du protocole. Ce qui rend WiPal intéressant est sa conception originale, et la nouveauté de
certaines de ses fonctionnalités. Dans cette thèse :
• Nous présentons des patrons de conception génériques pour la gestion de plusieurs
types de traces de paquets. Par exemple, l’utilisation d’un mécanisme pipe and filters
pour le traitement des traces, ou l’utilisation de callbacks statiques pour générer des
analyseurs syntaxiques qui soient simultanément génériques et efficaces.
• Nous présentons comment certaines fonctionnalités nouvelles peuvent être bénéfiques
aux programmes de traitement des traces de paquets, et comment les implémenter.
Par exemple, l’accès aléatoire a une trace de paquet, ou l’agrégation transparente de
plusieurs fichiers comme un seul flux de paquets.
• Nous soulevons un certain nombre de problèmes qu’un concepteur de programmes
peut rencontrer lorsqu’il écrit un logiciel de traitement de paquets. Nous présentons
les techniques existantes pour y faire face, et nous expliquons quelles techniques nous
avons retenues pour WiPal, et pourquoi.
• Nous évaluons la performance de WiPal et la comparons avec d’autres programmes de
traitement de traces de paquets. Les résultats montrent que la conception générique de
WiPal n’a pas d’effet notable sur ses performances (vis à vis de la vitesse d’exécution).
La vitesse de WiPal se compare à du code spécialisé. Également, certaines des nouvelles fonctionnalités n’ont pas d’impact sur les performances, tandis que d’autres,
qui sont optionnelles, impliquent un ralentissement limité.
Présentation générale de WiPal
WiPal est constitué d’une bibliothèque et d’un ensemble de binaires (programmes). Les binaires constituent une interface simple et rapide à utiliser pour les fonctionnalités de haut
niveau, mais ces fonctionnalités sont également disponibles à travers la bibliothèque. Par
exemple, pour fusionner plusieurs traces, la commande suivante suffit :
$ wipal-merge t1.pcap t2.pcap [t3.pcap...]
A.2. Contributions de cette thèse
78
1
#include <wipal/pcap/stream.hh>
2
#include <wipal/wifi/frame.hh>
3
4
using namespace wpl;
5
6
int main()
7
{
pcap::file<> f ("file.pcap");
8
9
for (pcap::file<>::iterator i = f.begin(); i != f.end(); ++i)
10
std::cout << wifi::type::names[wifi::type_of(i->bytes())] << std::endl;
11
12
}
Listing A.1 – Un exemple de programme qui utilise la bibliothèque de WiPal. Ce programme
affiche le type de chaque trame IEEE 802.11 qui compose file.pcap.
Parmi les fonctionnalités de haut niveau, on trouve la synchronisation de traces (en utilisant le programme wipal-synchronize), la fusion (avec wipal-merge), la computation
de statistiques (wipal-stats), l’anonymisation (wipal-anonymize), et quelques opérations
anodines comme la comparaison, la concaténation, ou l’affichage hexadécimal (wipal-cmp
ou wipal-cat, par exemple) Les fonctionnalités de bas niveau les plus importantes sont
les entrées/sorties au format pcap, le décodage de trames IEEE 802.11, et le support de
différents protocoles afférents. Il est important de noter que le code source de wipal-merge
n’est qu’une coquille autour des fonctionnalités de la bibliothèque. Actuellement, les codes
sources des binaires ont une taille moyenne de 122 lignes de C++ (l’ensemble de WiPal, dont
la bibliothèque, fait environ 20.000 lignes de code). Le binaire le plus petit nécessite 44 lignes
de code, et le plus gros 267. Ce code est principalement de la “glu” nécessaire aux techniques
de programmations génériques que WiPal utilise.
D’un autre coté, effectuer des tâches spécifiques avec le décodeur de trames de WiPal,
ou combiner plusieurs traitements dans un seul fichier exécutable, nécessite de l’utilisateur
qu’il écrive ses propres programmes en utilisant la bibliothèque de WiPal. Le Listing A.1
montre un exemple très simple d’un programme qui utilise cette bibliothèque.
Architecture de WiPal
La Figure A.2 présente un schéma simplifié de l’architecture de WiPal. Les binaires (en
haut) reposent sur la bibliothèque, qui elle-même utilise d’autres bibliothèques externes.
La bibliothèque est composée de plusieurs modules. Nous classons ces modules dans trois
catégories : la base, les protocoles et formats, et les filtres.
Base. Ces modules fournissent des fonctionnalités communes et simples, qui ne dépendent
pas vraiment du domaine d’application de WiPal. Par exemple, il s’agit d’exceptions pour
la gestion des erreurs, de classes abstraites génériques, et d’aides à la programmation sta-
Annexe A. Résumé de la thèse en français
79
F IGURE A.2 – L’architecture et les modules de WiPal.
tique. Grâce à l’utilisation de bibliothèques externes (comme Boost [1] ou GNU MP [3] ), nous
tentons de rendre cette couche aussi fine que possible.
Protocoles et formats.
Ces modules sont spécifiques au domaine applicatif de WiPal et
fournissent les fondations des traitements de haut niveau. Parmi les abstractions fournies,
citons les adresses IEEE 802, les traces au format pcap, et différents en-têtes de protocoles,
dont IEEE 802.11.
Filtres. À la base, une trace de paquets n’est qu’un simple flux de paquets réseaux. La plupart des algorithmes n’ont pas besoin d’autre chose que de lire ce flux de manière linéaire,
un paquet après l’autre, du début à la fin. Pour un tel mode de fonctionnement, il est tout à
fait approprié d’utiliser une architecture pipe and filters [17] . C’est donc ce que WiPal utilise.
Les différents pipes sont implémentés avec des itérateurs [32] . Par exemple, un filtre d’anonymisation nécessite un itérateur en entrée, et fourni un itérateur en sortie. Parfois, certains
traitements ont besoin d’être adaptés pour utiliser une telle architecture. C’est le cas de la fusion de traces IEEE 802.11. Il faut alors la décomposer en plusieurs opérations élémentaires
(un filtre effectue chaque opération) et relier ces opérations d’une manière précise. La Figure A.4 montre comment WiPal décompose l’opération de fusion. Toutes les opérations
qui accèdent à une trace de manière non-linéaire ont besoin d’une telle adaptation.
Fusion de traces de paquets IEEE 802.11
L’un des composants distinctifs de WiPal est son outil de fusion. Cet outil fonctionne horsligne et fusionne des traces de paquets IEEE 802.11. Ses principales caractéristiques sont la
performance, la facilité et la souplesse d’utilisation. En conséquence, sa conception ne fait
A.2. Contributions de cette thèse
80
A. Les traces ne sont pas synchronisées et ne contiennent pas toutes les
trames.
B. On identifie des trames de référence qui sont communes aux deux traces.
Cette information permet de synchroniser les traces.
C. On ajuste les estampilles temporelles de chaque trame afin de synchroniser
T1 et T2 .
D. Il est possible de fusionner les traces en comparant les estampilles temporelles. Les trames qui apparaissent en double (une fois dans chaque trace)
ne sont prises en compte qu’une seule fois.
F IGURE A.3 – Fusion de deux traces T1 et T2 .
pas d’hypothèse sur les traces qui nécessiterait que les moniteurs soient reliés à une infrastructure filaire (par exemple, certains outils nécessitent une synchronisation réseau [22] ). Cet
outil est également compatible avec tous les formats courants (IEEE 802.11 brut, en-têtes
Prism, Radiotap et AVS). Enfin, on peut l’utiliser simplement en l’invoquant directement
sur les traces (tandis que les autres outils nécessitent des architectures plus compliquées,
qui mettent généralement en jeu plusieurs serveurs [22;28;43] ). Cette thèse motive et décrit les
choix de conception de l’outil de fusion de WiPal :
• Elle propose de nouveaux algorithmes pour différentes étapes du processus de fusion.
En particulier, l’algorithme de synchronisation est une généralisation des algorithmes
existants dans la littérature.
• Elle fournit une analyse de l’algorithme de synchronisation ; nous montrons que que
celui-ci est plus précis que les algorithmes précédents.
• Elle fournit une étude de performance qui montre que l’outil de fusion de WiPal est
un ordre de grandeur plus rapide que Wit, le seul autre outil de fusion hors-ligne
publiquement disponible.
Annexe A. Résumé de la thèse en français
81
Nos analyses reposent sur seize traces réelles qui proviennent de quatre jeux de données
(uw/sigcomm2004 [50] de CRAWDAD, enregistré durant la conférence SIGCOMM 2004, et
trois jeux privés enregistrés dans des conditions différentes). Ils nous permettent de calibrer
différents paramètres, de valider le fonctionnement de l’outil de fusion, et de montrer son
efficacité.
Fonctionnement d’une fusion de traces
Afin de fusionner des traces Wi-Fi, il est en général
nécessaire de les synchroniser en premier lieu. Cette étape corrige les estampilles temporelles
de chaque trame afin que chaque trace utilise la même référence de temps. Ensuite il est possible d’identifier les trames qui sont identiques dans chaque trace afin qu’elle n’apparaissent
qu’une seule fois dans l’output (Cheng et al. [22] appellent cette étape l’unification).
Afin d’obtenir une synchronisation précise (une précision d’au pire 106 µs est requise), il
faut extraire des trames de références. Ce sont des trames dont il a été possible d’identifier automatiquement, et sans recourir à une quelconque synchronisation, qu’elles sont présentes
dans toutes les traces en entrée. En analysant les estampilles temporelles des trames de
référence il est possible de calculer un modèle d’horloge pour chaque trace qui va permettre
la synchronisation. La Figure A.3 illustre ce procédé.
Afin d’identifier des trames de références, WiPal commence par isoler des trames uniques.
Une trame est unique lorsqu’elle n’apparaı̂t sur le canal radio qu’une seule et unique fois
durant toute la durée de la mesure. Une trame qui n’apparaı̂t qu’une seule fois dans une
trace mais qui est en réalité apparue deux fois lors de la mesure ne doit pas être considérée
comme une trame unique. Les trames uniques sont des candidates pour devenir des trames
de référence. En réalité, les trames de références sont les trames uniques qui sont partagées
par chaque trace. L’étape qui calcule les références à partir des trames unique est l’intersection. Un schéma de l’ensemble de l’opération de fusion telle que la pratique WiPal est montré
dans la Figure A.4.
A.2.2 Applications de WiPal : analyses empiriques
En utilisant les différents outils de WiPal, nous pouvons ensuite conduire des analyses sur
des jeux de données que nous avons collectés en utilisant le sniffing. Cette thèse présente
deux de ces analyses. La première se concentre sur la précision du sniffing Wi-Fi. La seconde
étudie les usages du Wi-Fi dans des environnements sociologiquement différents.
Nous obtenons toutes nos traces en utilisant des moniteurs (des netbooks) équipés de
trois interfaces radios (ASUS EeePC 700 avec des adaptateurs Wi-Fi USB Netgear WG111v3,
voir la Figure A.5). Les radios écoutent les canaux 1, 6 et 11. Chaque radio est configurée en
mode moniteur et enregistre toute les trames qu’elle entend, indépendamment du réseau.
A.2. Contributions de cette thèse
82
F IGURE A.4 – L’architecture du processus de fusion de WiPal.
Précision du sniffing
D’abord, nous collectons des jeux de données de courtes durées (de une à deux heures) en
utilisant jusqu’à huit moniteurs localisés au même endroit. Dans un premier temps, l’analyse de ces traces révèle plusieurs défaut avec les techniques existantes d’évaluation de la
complétude d’une trace de paquets Wi-Fi. Ensuite, nous analysons comment la complétude
d’un jeu de données varie en fonction du nombre de moniteurs qui compose ses traces.
Défauts des techniques d’évaluation de la complétude
Toutes les techniques existantes
pour évaluer la complétude d’une trace reposent sur le fait qu’un protocole, par essence,
définit quelles sont les séquences de trames qui sont valides. Quand une trace contient une
séquence qui n’est pas valide, c’est très probablement que cette séquence est incomplète.
Il s’agit alors de trouver un nombre minimal de trames à insérer afin que la séquence de-
Annexe A. Résumé de la thèse en français
83
F IGURE A.5 – Un ASUS EeePC 700 avec trois adaptateurs Wi-Fi USB Netgear WG111v3 tel
qu’utilisé pour la collection de nos traces.
vienne valide. On suppose ensuite que ce nombre est exactement la quantité de trames qui
ont été perdues par le moniteur. Pour IEEE 802.11 il existe deux catégories de techniques :
(i) les techniques orientées messages qui se basent sur les types des trames (par exemple,
une trame de management ou une trame de donnée précède obligatoirement un acquittement) [22;38;43] et (ii) les techniques orientées numéros de séquence (seqnum) qui se basent sur
les numéros de séquence (par exemple, si la trame 42 suit la trame 39, c’est que les trames 40
et 41 ont été perdues).
Pourtant, plusieurs défauts rendent ces techniques imprécises. En partie à cause de leurs
mode opératoire et en partie parce que des anomalies existent dans les traces. En effet, ces
techniques supposent que chaque périphérique Wi-Fi se conforme exactement au standard
IEEE 802.11 – ce n’est malheureusement pas toujours le cas. Voici une liste des défauts que
nous avons pu soulever.
• Les techniques existantes supposent que le réseau n’est pas congestionné. Dans un environnement congestionné, beaucoup de trames échouent leurs procédures d’accès au
médium. Cela signifie alors que les trous dans les numéros de séquences révèlent des
échecs de transmission plutôt que des pertes des moniteurs.
• Les techniques seqnums supposent des périphériques qui génèrent des numéros de séquences
corrects. C’est faux en pratique. En effet :
1. Certains points d’accès réinitialisent leurs compteurs de numéros de séquence à
2048 au lieu de 4096 [53] . En fonction de la technique d’analyse, cela peut conduire
à surestimer ou sous-estimer le nombre de trames manquantes.
2. Certains points d’accès utilisent zéro pour tous leurs numéros de séquence (nous
l’avons observé dans certaines de nos expériences).
A.2. Contributions de cette thèse
84
3. Certains points d’accès gèrent en réalité plusieurs points d’accès “virtuels”. En
théorie, chaque point d’accès virtuel devrait entretenir son propre compteur de
numéros de séquences. En pratique ce n’est pas toujours le cas, et cela conduit à
une surestimation du nombre de trames manquantes.
• Les techniques messages ne détectent pas certaines pertes en rafale. Par exemple, les techniques messages ne peuvent détecter la perte d’une trame de donnée si l’acquittement
correspondant a aussi été perdu. Nos études montrent que les pertes en rafale constituent une proportion significatives des pertes dans chaque trace.
Impact du nombre de moniteurs sur la complétude
Après avoir étudié les techniques
d’estimation de la complétude, nous analysons comment la complétude d’un jeu de donnée
varie en fonction du nombre de moniteurs qui compose ses traces. Nous identifions plusieurs résultats intéressants.
• Comme on pourrait s’y attendre, plus le nombre de moniteurs est élevé, moins il est
intéressant de rajouter un nouveau moniteur.
• En revanche, même en utilisant huit moniteurs au même endroit, chaque moniteur
contient une petite proportion de trames qui n’ont été entendues par aucun autre moniteur.
• En utilisant seulement deux moniteurs, on peut obtenir en moyenne entre 70% et 80%
des trames qu’on aurait obtenues si on avait utilisé huit moniteurs. C’est à dire que
la plupart des trames sont partagées entre les moniteurs. Ceci dit, il faut utiliser au
moins cinq moniteurs pour dépasser 90%.
• Individuellement, la précision des moniteurs est très variable. Avec un seul moniteur,
on peut capturer entre 45% et 90% de ce qu’il aurait été possible de capturer avec huit
moniteurs.
Pour résumer, la plupart des trames sont reçues par plusieurs moniteurs, mais quelques
unes sont très difficile à entendre. C’est à dire que, sur six ou huit moniteurs, chaque moniteur contient une petite proportion de trames originales. En conclusion, il nous semble
que les pertes sont inévitables, et il est donc important que les chercheurs utilisent des techniques d’analyses qui restent fiables même en présence de trames manquantes.
Usages du Wi-Fi en milieu urbain
Dans une deuxième analyse, nous collectons et analysons des traces de longues durées
(longues de trois et dix jours) obtenues dans trois environnements : un bureau, une zone
résidentielle urbaine dense, et une zone résidentielle de banlieue de faible densité. Nous
étudions le comportement de chaque périphérique plutôt que les caractéristiques du trafic.
Annexe A. Résumé de la thèse en français
85
Canal 1
Canal 1
1 jour
1 jour
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
Canal 6
1 sem.
1 jour
1h
15 min
3 min
0
20
0
1h
15 min
3 min
1h
15 min
3 min
1h
15 min
3 min
0
25
0
20
0
15
0
10
50
0
40
35
30
25
20
15
10
5
0
0
60
0
50
0
40
0
30
0
20
0
10
0
(b) Résidentiel, banlieue
15
1 jour
(a) Bureau
0
Canal 11
1 jour
Périphériques (par durées d’activité)
10
Canal 11
1 sem.
1 jour
Périphériques (par durées d’activité)
50
0
Durée totale d’activité
1 jour
30
25
20
15
10
5
0
0
0
50
0
40
0
30
20
0
10
0
Durée totale d’activité
Canal 6
Canal 11
0
14
0
12
0
10
80
60
40
20
0
25
20
15
1h
15 min
3 min
10
1 jour
5
Canal 6
0
0
400
350
300
250
200
150
10
50
0
Durée totale d’activité
Canal 1
1 sem.
1 jour
Périphériques (par durées d’activité)
(c) Résidentiel, centre-ville
F IGURE A.6 – Distribution des durées d’activité cumulées, pour chaque trace et pour chaque
station.
Nous nous intéressons à des observations comme la durée totale d’activité d’un périphérique,
la fréquence d’apparition de nouveaux périphériques, et l’activité que nous pouvons extraire des traces. Dans ce résumé, nous présentons deux exemples de résultats que nous
obtenons en analysant ces jeux de données.
Durées d’activité cumulées La Figure A.6 présente la distribution des durées d’activité
cumulées dans toutes les traces et sur tous les canaux. Chaque impulsion représente la
durée totale d’activité d’un périphérique pour une trace donnée. Nous considérons qu’un
périphérique est actif lorsqu’il a émis une trame dans les dernières trois minutes (n’importe
quel type de trame : management, donnée, ou contrôle). Nous utilisons cette fenêtre de trois
minutes car des pilotes des points d’accès utilisent des temporisateurs avec des durées similaires (par exemple, les temporisateurs de MadWifi varient entre 30 secondes et 5 minutes).
De plus, ne nécessiter qu’une seule trame toutes les trois minutes rend la technique robuste
vis à vis des pertes de trames. Un certain nombre de caractéristiques sont communes à toutes
les traces.
• Les périphériques ne sont pas répartis de manière uniforme sur les différents canaux. Dans
toutes les traces, les périphériques apparaissent plus souvent sur le canal 11 que sur le
canal 6, et plus souvent sur le canal 6 que sur le canal 1. C’est une conséquence directe
A.2. Contributions de cette thèse
86
de ce que les réseaux ne sont pas répartis de manière homogène sur les différents
canaux.
• La distribution des durées d’activité n’est pas uniforme pour une trace et un canal donné
(remarquez que la Figure A.6 utilise une échelle logarithmique). Il y a trois classes de
périphériques : (1) ceux qui sont (presque) toujours actifs, (2) ceux qui n’apparaissent
qu’une seule fois, et (3) les autres. Au sein de l’ensemble des traces, 31 périphériques
(sur un total de 2.395) appartiennent à la classe (1)3 . 27 de ces périphériques semblent
être des points d’accès. Deux des quatre périphériques restants font partie de la trace
de bureau, et deux dans la trace de centre-ville. Comme ils n’émettent pas de balises, il
ne s’agit pas de périphériques en mode ad hoc. Il est donc intéressant de noter qu’une
poignée d’utilisateurs laisse leurs périphériques allumés en permanence. Un partie significative des périphériques appartient à la classe (2) (20% au bureau et dans le centreville, 9% en banlieue). Cela signifie que beaucoup d’utilisateurs ne sont pas réguliers et
ne font que passer. La classe (3) est variée et inclus l’ensemble des valeurs possibles.
Néanmoins, plus la durée est courte, plus la probabilité est forte.
• La plupart des périphériques sont presque inactifs. Entre 48% et 96% des périphériques,
en fonction de la trace et du canal (76% en moyenne), sont actifs moins d’une heure
pendant toute la durée de la trace. Donc une majorité de périphériques est inactive la
plupart du temps.
Sur certains points néanmoins les traces présentent des caractéristiques différentes. Les
profils des traces de bureau et de banlieue sont similaires, mais dans cette dernière les
périphériques ont tendance a cumuler des durées d’activité plus longues. Le profil de la
trace de centre-ville présente une rupture très nette entre les périphériques actifs et ceux
qui sont presque inactifs. On perçoit bien ces variations si l’on regarde les durées d’activité
moyennes : 2h36 pour la trace de bureau, 11h48 pour la trace de banlieue, et 2h21 pour
la trace de centre-ville (alors même que cette trace est trois fois plus longue que les deux
autres). En conclusion, dans certain environnements, en moyenne, les périphériques sont actifs
plus souvent.
Croissance du nombre de périphériques La Figure A.7 présente la croissance du nombre
de périphérique. Chaque courbe est associée à un canal et une trace donnée (avec une courbe
supplémentaire pour chaque trace, qui représente la moyenne des canaux). Chaque point
montre combien de périphériques distincts une combinaison (trace, canal) contient entre
le début de la mesure et le temps donné en abscisse. Nous considérons que chaque adresse
MAC représente un périphérique, et nous cherchons les adresses MAC dans tous les champs
de la trame. C’est à dire que certains périphériques sont mentionnés en tant que destinataire
3 Un
périphérique qui apparaı̂t sur plusieurs canaux compte plusieurs fois.
Annexe A. Résumé de la thèse en français
2500
90
Canal 1
Canal 6
Canal 11
Moyenne
2000
87
600
Canal 1
Canal 6
Canal 11
Moyenne
80
70
500
60
1500
Canal 1
Canal 6
Canal 11
Moyenne
400
50
300
40
1000
30
200
20
500
100
10
0
0
0
06
12
ur
jo 11
ur
jo 10
ur
jo 09
ur
jo 08
ur
jo 07
ur
jo 06
ur
jo 05
ur
jo 04
ur
jo 03
ur
jo 02
ur
jo 01
ur
jo
ur
05
04
03
02
01
05
04
03
02
01
(b) Résidentiel, banlieue
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
ur
jo
(a) Bureau
(c) Résidentiel, centre-ville
F IGURE A.7 – Nombre d’adresse MAC distinctes que contient chaque trace entre le début de
la mesure et un temps donné.
mais pas comme émetteur. C’est pourquoi nous découvrons plus de périphériques que sur
la Figure A.6. De plus, à cause d’un détail de IEEE 802.11, certains champs d’adresse comportent des valeurs qui en réalité ne correspondent pas à des adresses MAC réelles (mais
des BSSID indépendants). Nous ignorons ces champs. De la Figure A.7 nous pouvons tirer
un certain nombre d’observations.
• Les périphériques ne sont pas répartis de manière uniforme sur les différents canaux. De
manière curieuse, la répartition n’est pas la même que celle des périphériques qui
émettent des trames. Dans toutes les traces, le canal 1 est celui qui contient le moins de
périphériques. C’est tout à fait cohérent avec les résultats précédents (voir ci-dessus).
Néanmoins, le canal 6 attire plus d’utilisateurs que le canal 11 dans deux des trois
traces. C’est en contradiction directe avec la répartition de la Figure A.6. Une différence
est que cette figure ne prend en compte que les émetteurs tandis que la Figure A.7
considère tous les périphériques. En tout état de cause, cela signifie qu’il n’est pas si
évident de déterminer la répartition des utilisateurs sur certains canaux.
• La vitesse de découverte met en évidence un phénomène jour/nuit. Les courbes alternent
périodiquement entre des périodes plates et des périodes de croissance. En fonction
de la trace, ce phénomène est d’une amplitude et d’une période variable, mais on
l’observe dans toutes les traces. Les périodes plates apparaissent la nuit, commencent
généralement aux alentours de minuit, et s’arrêtent quelques heures avant midi. Cela
montre, comme on pouvait s’y attendre, que l’activité Wi-Fi est corrélée avec l’activité
humaine.
• Dans les traces de bureau et de centre-ville, la vitesse de découverte est constante durant une
longue période. De plus, dans la trace de centre-ville, cela est toujours vrai même après
A.3. Conclusion
88
une semaine de mesure. En revanche, la trace de banlieue s’aplatit au bout de deux
jours. Nous pensons que cela est une conséquence directe de l’environnement : en
centre ville et dans des bureaux il y a une plus forte mobilité, et un turnover des individus plus important. On peut donc s’attendre à ce que beaucoup de nouveaux utilisateurs apparaissent sans revenir avant la fin de la mesure. Cela explique également
pourquoi les temps d’activités moyen par utilisateurs sont plus élevés dans la trace
de banlieue (voir ci-dessus). Notons néanmoins que même quand la vitesse de découverte
chute après deux jours, il est encore possible de découvrir des nouveaux utilisateurs vers la fin
de la trace.
Parmi toutes ces observations, nous pensons que deux d’entre elles sont d’une importance particulière. D’abord, comme le montre les durées d’activité, les utilisateurs sont mobiles, ou bien ils éteignent généralement leurs équipements Wi-Fi. Cela donne des traces
de paquet dans lesquelles la plupart des périphériques sont éteint la plupart du temps.
Ensuite, les environnements ont des impacts différents sur la mobilité. Cela se traduit par
des apparitions de nouveaux utilisateurs qui sont soit réparties de manière homogène, soit
groupées au début de la trace. Parmi les autres résultats dont nous n’avons pas parlé dans ce
résumé, nous avons également noté que l’intensité de l’activité Wi-Fi alterne entre les zone
résidentielles et les bureaux. Cela est dû au fait que que tous les environnements font partie
de la vie des utilisateurs, mais à un moment précis de la journée.
A.3 Conclusion
Le sniffing sans fils est une technique puissante pour mesurer l’activité des réseaux Wi-Fi,
bien que cela pose un certain nombre de questions. Ces questions sont à la fois pragmatiques
et théoriques. D’une part les logiciels disponibles pour gérer les traces IEEE 802.11 sont souvent insatisfaisant. D’autre part, la pertinence des traces de paquets IEEE 802.11 est sujette
à caution. Dans cette thèse, nous abordons ces questions et apportons un certain nombre de
réponses.
D’abord, nous développons WiPal, une boı̂te à outils logicielle pour faciliter le traitement des traces de paquets IEEE 802.11. WiPal inclut un outil de fusion de traces flexible.
Ensuite, à travers l’analyse de deux traces de courte durées, nous étudions la précision offerte par des moniteurs Wi-Fi. Une dernière étude collecte et exploite trois traces de longues
durées dans des environnements différents. Cela nous permet d’étudier les usages que font
les utilisateurs du Wi-Fi.
Afin d’étendre ces analyses, nous sommes actuellement en train d’analyser des traces
obtenues avec plusieurs moniteurs répartis dans le parc Monceau, à Paris. L’activité Wi-Fi
au sein de ce parc nous intéresse car celui-ci inclut plusieurs poins d’accès situés à différents
Annexe A. Résumé de la thèse en français
89
F IGURE A.8 – Position des moniteurs pour la collection de traces dans le parc Monceau. Le
travail d’analyse des traces est en cours. (Arrière plan : Google Maps.)
endroits du parc. Avec dix moniteurs, nous avons couvert une superficie équivalente à environ la moitié du parc (cf. Figure A.8). Nos analyses sont en cours et nous n’avons que peu de
résultats pour le moment. Les traces incluent 138 émetteurs, dont 71 sont de marque Apple.
Nous pensons qu’il s’agit principalement d’appareils mobiles (iPhone ou iPod touch). Avec
un tel nombre de périphériques mobiles, il est possible que ces traces révèlent des usages
nouveaux.
Enfin, nous envisageons plusieurs travaux pour étendre WiPal et mieux comprendre les
phénomènes précédemment observés. En effet, il est possible de rajouter le support pour de
nouveaux protocoles et de nouveaux algorithmes dans WiPal, afin de montrer sa généricité
et d’en faire un outil “universel”. Nous aimerions également le rendre encore plus simple
d’utilisation, et améliorer ses procédures de test (et pourquoi pas, le prouver formellement ?)
En ce qui concerne nos mesures de la précision des moniteurs, nous aimerions effectuer
des expériences contrôlées pour mesurer l’impact réel de la congestion sur les moniteurs.
Nous devrions également étudier pourquoi le processus de sniffing montre autant de variabilité. Il serait intéressant à cette fin d’utiliser différents types de matériels, et de varier les
paramètres des expériences. En ce qui concerne la mesure des différents environnements,
nous avons deux questions principales. D’abord, nous aimerions voir dans quelles proportions notre méthode d’analyse provoque des bais de mesure. Ensuite, nous aimerions tester
plus d’environnements, et essayer de faire ressortir des catégories d’environnements avec
des propriétés similaires.
90
A.3. Conclusion
Appendix B
WiPal manual
WiPal is a piece of software dedicated to IEEE 802.11 traces manipulation. It comes as a set
of programs and a C++ library. A distinctive feature of WiPal is its merging tool, which
enables merging multiple wireless traces into a unique global trace. WiPal’s key features are
flexibility, ease of use, and efficiency.
B.1 The programs
This part documents the programs WiPal features.
B.1.1 Invocation
WiPal’s programs all use the same invocation scheme:
wipal-<command> [options] [inputs] [outputs]
The command line may include no options and, depending on the program, there may
be no inputs or no outputs. Most programs expect at least one input however. See the
specific documentation for each program in order to know how many inputs and outputs
each program expects.
Inputs, outputs, and options may be mixed on the command line, e.g.,
wipal-simple-merge -n -P input1.pcap input2.pcap output.pcap
wipal-simple-merge input1.pcap input2.pcap output.pcap -P -n
wipal-simple-merge input1.pcap -n input2.pcap -P output.pcap
...
are all equivalent.
WiPal’s programs use getopt(3) to parse options, so they only have short options (no
long options) composed of a dash followed by a letter (e.g.,-a, -t, etc.) Option letters always
have the same meaning whatever the program. All options are not available for all programs
91
B.1. The programs
92
though (some options do not make sense with some programs). For instance, -P always
means the invoked program should consider frames with non-zero Prism fields as invalid.
In order to know which options a program accept, use the -h option.
Finally, some options expect an extra argument right after they are provided:
wipal-test-uniqueness -a hsh_80211 input.pcap
^^^^^^^^^
This is not an input
Available options
-8 When comparing two packets, only compare IEEE 802.11 frames. Do not compare Prism
or pcap headers. This option is incompatible with traces of pcap link type EN10MB.
-a See Section B.1.7. Specify which attributes the program must use to identify unique
frames. An attribute specifier must follow this option on the command line. To see
a list of valid attribute specifiers, use the -h option.
-b When comparing two packets, only compare packet bytes. Do not compare pcap headers.
-c Do not print column headers. This is the default when standard output is not a TTY.
-C Do print column headers. This is the default when standard output is a TTY.
-d When comparing two packets, compare everything: pcap headers and packet bytes. This
is the default.
-e In table outputs, do not use a column to report error values. This is the default.
-E In table outputs, do use a column to report error values.
-g Enable debugging output. As of now this only makes WiPal programs to display their
options right after they parse the command line.
-h Help. Print a short summary describing how one should invoke the program, which
options it accepts, and possibly which attribute specifiers are accepted for option -a.
-i In table outputs, do not print frame indices.
-I In table outputs, do print frame indices. This is the default.
-m Specify a MAC address mapping file.
Some WiPal programs need to map MAC addresses to other identifiers. For instance,
wipal-extract-unique-frames with the seq bss tmp attributes maps MAC addresses
to 32-bit integers for performance reasons. wipal-anonymize maps real addresses
Appendix B. WiPal manual
93
to anonymous ones. Each program stores these mappings into a file so they can be
reloaded and reused latter. This option allows users to control the name of this file.
When not specified, the “MAC.map” filename is used.
The file is just a plaintext file where each line contains a value and the corresponding
MAC identifier.
A filename should follow this option. The file might not exist (in which case it will be
created). If it exists, it might be extended, but will not be truncated.
-n Consider Prism headers are little endian. This is the default when the corresponding
pcap file is little endian. Note that some broken traces are big endian yet have little
endian Prism headers. Thus this option.
-N Consider Prism headers are big endian. This is the default when the corresponding pcap
file is big endian.
-o When comparing two traces with wipal-cmp, compare everything (pcap headers and
packet bytes, as with option -d) and count how many bytes differ. The count is printed
on standard output. The exit status remains unchanged.
-p In Prism headers, do not consider noise fields have a special meaning. This is the default.
-P In Prism headers, consider non-null noise fields indicate a PHY error, and thus an invalid
frame. Such frames will be ignored, e.g., with wipal-cat they will not appear in the
output.
This option implicitly implies the input trace is composed of Prism headers (as pcap
link type).
-q Quiet. Produce minimal output.
-r Blacklist a given reference frame. The reference frame will then be ignored and will not
be used during synchronization. See Section B.1.6.
A reference frame identifier must follow this option, e.g., 42-51 (indicating the reference frame composed of the unique frames 42 and 51).
You may use this option multiple times, e.g.,
wipal-simple-merge -r 42-51 -r 666-505 \
input1.pcap input2.pcap output.pcap
will blacklist both references 42-51 and 666-505.
B.1. The programs
94
-s Specify an ESSID mapping file.
This option works as -m but for files that map ESSIDs to other values. For instance,
wipal-anonymize maps valid ESSIDs to anonymous ESSIDs.
See -m for details. The default mapping file is “ESSID.map”.
-t When comparing two packets, only compare IEEE 802.11 frames, along with some timestamps (e.g., pcap time, Prism MAC time, etc). Which timestamps are used depends on
the traces’ link types (and whether options -y or -Y are provided as well). Compare
time values with a precision of 106 microseconds by default (that is, assume two values are equal when they are spaced by less than 107 microseconds). You can change
the precision using option -x.
-u In table outputs, do not print microsecond timestamps. This is the default.
-U In table outputs, do print microsecond timestamps.
-v Display the program’s version (actually the version of the WiPal package the program
come from).
-x Change precision for timestamps comparisons (e.g., when using wipal-cmp with -t or
when merging or synchronizing traces).
By default, when the duration between two timestamps is 106 microseconds or less,
WiPal programs consider the timestamps are equal. The rationale for this behavior is
106 microseconds is half the shortest frame interarrival time between two IEEE 802.11
frames (in infrastructure mode). Thus, this is the largest value one can afford when
synchronizing IEEE 802.11 traces.
Use -x to change this value. The new expected precision should be right after -x on
the command line.
-y Force the use of pcap timestamps.
Some traces contain multiple timestamps for each frame. For instance, pcap traces
with link type PRISM HEADER have the standard pcap timestamps plus extra PHY-level
timestamps provided by the network adapter. WiPal programs’ policy is to use the
most precise timestamps (that is, to ignore pcap timestamps when something else is
available). This option alters this behavior and forces programs to use pcap timestamps.
-Y Force the use of PHY-level timestamps when available. This is the default. See option -y
for a more detailed explanation.
Appendix B. WiPal manual
95
Input syntax
Basic usage You may provide the name of a pcap trace file as input.
wipal-cat input.pcap output.pcap
Input concatenation You may provide the name of several pcap traces separated with columns
(do not include any space). This tells the program to consider the concatenation of each
trace as a single input.
wipal-cat input1.pcap:input2.pcap:input3.pcap output.pcap
will put into “output.pcap” the content of “input1.pcap”, followed by the content of
“input2.pcap” and then “input3.pcap”.
Every program understands this syntax. Note that specifying multiple traces with
columns makes no sense for outputs:
wipal-cat input1.pcap:input2.pcap output1.pcap:output2.pcap
will concatenate “input1.pcap” and “input2.pcap” into a single file named
“output1.pcap:output2.pcap”!
Address specification Some programs (e.g., wipal-merge with attributes hsh en2) might
need the IPv4 address of the machine that generated a trace to work properly. Attach
such an address to a trace as follows:
wipal-merge -a hsh_en2 foo.pcap=192.168.1.1 \
bar1.pcap:bar2.pcap=192.168.1.2
The rationale is that, in some cases, timestamps of emitted frames are not as precise
as timestamps of received frames, and thus emitted frames should be ignored during
synchronization.
Special characters When your traces’ filenames contain the special characters ‘:’ or ‘=’
they need to be escaped with a backslash (‘\’):
wipal-cat weird\=file\:name.pcap out.pcap
wipal-merge -a hsh_en2 weird\=1:weird\=1:2=192.168.1.1 \
foo.pcap=192.168.1.2
B.1. The programs
96
B.1.2
Concatenation (and Prism noise filtering)
One may concatenate traces using the wipal-cat command. It takes exactly one input and
one output. It may be useful to recombine a trace that was split, or filter out frames with
Prism noise (using the -P option).
wipal-cat in.pcap out.pcap
wipal-cat foo.pcap.0:foo.pcap.1 foo.pcap
wipal-cat -P in.pcap out.pcap
wipal-cat -P bar.pcap.0:bar.pcap.1:bar.pcap.2 bar.pcap
The first example just copies “in.pcap” into “out.pcap”. Note that the two files might
be different at the byte level, e.g., if “in.pcap” is big endian and the program is run on a
little endian machine.
The second example concatenate “foo.pcap.0” and “foo.pcap.1” and put the result
into “foo.pcap”.
The third example copies “in.pcap” into “out.pcap” but removes frames that have a
non-zero noise field in their Prism headers.
The fourth example both concatenates traces while filtering noisy frames out.
B.1.3 Comparisons
One may test two pcap traces for equivalence using the wipal-cmp command. The default
is to compare every bit of information (pcap headers plus packet bytes) but you may change
this behavior using the -8, -b, -o, or -t options. Note that this is different however to using
diff or cmp since traces with different endianness may contain the same packets.
By default wipal-cmp produces a report on the standard output indicating either that
traces are equal, either which packet is the first to mismatch. Use -q if you are only interested
in the program’s exit status and do not want to produce any output. Use -o if you are
interested in counting the number of bytes that differ between two traces.
e.g.,
wipal-cmp foo.pcap bar.pcap
wipal-cmp -q foo.pcap bar.pcap
wipal-cmp -q -8 in1.pcap.0:in1.pcap.1 in2.pcap
...
B.1.4 Sub-traces
One may extract sub-traces of pcap traces using wipal-extract-subtrace,
wipal-extract-transmitter, or wipal-extract-bssid.
Appendix B. WiPal manual
97
wipal-extract-subtrace takes two dates and a pcap trace as inputs, and produces one output. Unfortunately, it does not support any option currently.
wipal-extract-transmitter takes a MAC address and a pcap trace as input, and produces
one output. Its output contains the frames from its input that were transmitted by the
given address. Note that the command looks at transmitters, not originators, e.g., the
transmitter of a data frame that crossed the distribution system is the output access
point, not the original sender. Also note that some frames do not contain information regarding their transmitters (e.g., MAC acknowledgements) and therefore cannot
appear in the output, even if they were effectively sent by the given address.
wipal-extract-bssid works as wipal-extract-transmitter, but the MAC address represents a BSSID and the command extracts frames that belong to the corresponding BSS.
Again, note that some frames do not contain information regarding their BSS. These
frames therefore cannot appear in the output, even if they were effectively belonging
to the given BSS.
e.g.,
wipal-extract-subtrace 2007-01-01 2008-01-01 \
in.pcap.0:in.pcap.1 out.pcap
wipal-extract-subtrace \
"2004-Aug-30 16:59:39.789221" "2004-Aug-30 16:59:39.929872" \
kalahari-ath2 subtrace.pcap
wipal-extract-transmitter 71:19:9f:6f:71:33 in.pcap out.pcap
wipal-extract-bssid
9b:d2:d7:7f:aa:63 in.pcap out.pcap
B.1.5 Merging
One may merge two IEEE 802.11 traces into one using the wipal-simple-merge command.
Use the -h option to have a description of the command’s syntax. It takes two inputs
and produce one output. When ran, the merging process starts by synchronizing precisely
both inputs (see Section B.1.6). Then both traces are merged and special care is given not to
re-order packets or account duplicate packets twice in the output (that is, packets that are
present in both traces appear only once in the output).
This command expects pcap traces with either Prism headers, AVS headers, Radiotap
headers, raw IEEE 802.11 frames, or pseudo-Ethernet II frames as link type. The -p and -P
options only work with Prism headers. The following timestamps are used, unless -y is
provided:
B.1. The programs
98
IEEE 802.11 frames pcap timestamps,
Ethernet II frames pcap timestamps,
Radiotap headers Radiotap headers’ tsft fields. The command will fail with Radiotap
headers that do not contain such fields,
AVS headers AVS headers’ mactime fields,
Prism headers Prism headers’ mactime fields.
e.g.,
wipal-simple-merge a.pcap b.pcap output.pcap
wipal-simple-merge -P -n foo-ath2.0:foo-ath2.1 bar-ath2 foo-bar-ath2
...
Notes regarding traces with Ethernet II frames as link type
See Section B.1.7.
Since version 4.0, WiPal is able to merge traces with Ethernet II frames as link type. This
is useful because some wireless traces use this link type. These traces only contain IP packets
encapsulated into pseudo-Ethernet frames.
Since these traces contain no IEEE 802.11 MAC headers one cannot use the usual attributes – that rely on these headers – to merge them. Therefore, use the hsh en2 attributes
when merging Ethernet II traces (see option -a). Using theses attributes tell WiPal to decapsulate Ethernet frames and use the following frames as unique frames:
• OLSR packets (IPv4 and IPv6),
• IPv6 router advertisements.
Also note that machines recording pcap traces while emitting packets generally record
imprecise timestamps for emitted packets. In order to solve this issue, you might specify an
IPv4 address for each trace (see Section B.1.1). Frames originating from this address in this
specific trace will be ignored for synchronization.
Finally, remember that Ethernet traces only contain pcap timestamps, and these timestamps are not as precise as PHY-level timestamps. You might want to use option -x to raise
the expected precision above 106 microseconds.
Merging more than two traces
wipal-simple-merge is only able to merge two traces. In order to merge more traces, one
should run successive merges following a given sequence. For instance, merging traces A,
B, and C might involve merging A and B into T first, and then merging T and C. The
Appendix B. WiPal manual
99
wipal-merge command selects a merging sequence and runs the corresponding merge operations in turn.
e.g.,
wipal-merge t1.pcap t2.pcap t3.pcap
wipal-merge -n -P t11.pcap:t12.pcap:t13.pcap t21.pcap:t22.pcap t3.pcap
There is no rule to determine which merging sequence will give the “best” results. We
consider the two traces that are the most similar should be merged first. This to avoid generating anomalies due to a lack of reference frames (see Section B.1.6). In order to compute
similarity between two traces A and B, WiPal count the number of reference frames it is
able to extract from these traces, stopping when it reaches B’s 250,000th unique frame (see
Section B.1.7). Despite its issues, this technique has the advantage of being both simple
to implement and fast (determining a merging sequence should not take more time than
actually merging the traces).
wipal-merge computes its merging sequence as follows. Note that it is designed to be
fast rather than to yield an optimal sequence.
1. For each trace, compute its similarity with each other trace.
2. Sort results by similarity.
3. Pick up the most similar result.
• If it involves two non-merged traces, merge them.
• If it involves a trace A that has already been merged into another trace T, consider
merging T instead of A.
• If it involves two traces that were already merged into the same trace, do nothing.
4. Pick up the next result in the list and repeat step 3 until all traces have been merged
into one unique trace.
One may compute the similarity between multiple traces using the wipal-similarity
command. The output is sorted by ascending order of similarity. e.g.,
wipal-similarity t1.pcap t2.pcap
wipal-similarity -P t1.pcap t2.pcap t3.pcap t4.pcap
B.1.6 Synchronization
In order to merge two IEEE 802.11 traces WiPal needs to synchronize them precisely. In
order to do so, it first identifies some frames that appear in both inputs. These are reference
frames. It uses these frames to model clock desynchronization among the traces. It then
update the first trace’s timestamps so they are synchronized with the second trace.
B.1. The programs
100
One may use the wipal-synchronize command to synchronize two traces. It takes two
inputs and produce one output. The output contains the same packets as the first input, but
with synchronized timestamps.
To extract reference frames WiPal extract some specific frames called unique frames (see
Section B.1.7) from both input traces and then intersect the two obtained sets. One may use
the wipal-intersect-unique-frames command to get the result of this operation (i.e., the
list of reference frames used for synchronization of two traces).
WiPal’s synchronization process synchronizes reference frames before it synchronizes
other frames. One may get the result of this operation using the wipal-synchronize-unique-frames
command.
e.g.,
wipal-intersect-unique-frames -n -P foo.0:foo.1:foo.2 bar.0:bar.1
wipal-synchronize-unique-frames -n -P foo.0:foo.1:foo.2 bar.0:bar.1
wipal-synchronize -n -P foo.0:foo.1:foo.2 bar.0:bar.1 foo-sync
Synchronizing more than two traces
Due to WiPal’s mode of operation, it is not possible to synchronize multiple traces on a
common timeline in a single operation. wipal-merge-and-synchronize however provide a
similar feature. It behaves as follows:
1. First, merge all the traces given on the command line. At this stage, the command
behaves exactly as wipal-merge.
2. Then, synchronize each individual trace from the command line with the timeline of
the previously merged trace. Record each synchronized trace into the files “sync-1”,
“sync-2”, etc.
For instance:
wipal-merge-and-synchronize t1.pcap t2.pcap t3.pcap
will merge “t1.pcap”, “t2.pcap”, and “t3.pcap” into the file “merge-2”. Then each trace
will be synchronized using “merge-2”’s timeline.
B.1.7 Unique frames
A frame is said to be unique when it appears on the air once and only once for the whole
duration of a trace. WiPal’s unique frame extraction process is an important stage of its trace
synchronization process. WiPal programs’ default policy is to consider all beacon frames
and all non-retransmitted probe responses as unique frames.
Appendix B. WiPal manual
101
One may use the wipal-extract-unique-frames command to get a list of the unique
frames that compose a trace. Run wipal-extract-unique-frames -h to get its invocation
syntax.
In practice, WiPal does not extract and load full unique frames into memory. This would
slow the process down and require an excessive amount of memory. The default is to work
on MD5 frame hashes when WiPal was compiled using OpenSSL. When compiled without
OpenSSL, WiPal only extracts a subset of frame fields. We call the pieces of information WiPal extracts to identify unique frames “frame attributes”, or sometimes “frame identifiers”.
You may specify frame attributes to use with the -a option. In practice, the difference
in speed and memory consumption between attributes is negligible. There is an important
difference between attributes, though. With some attributes, different unique frames may
yield identical attributes (collisions). This is of course an undesirable behavior.
One may check that a given trace’s unique frames are really unique w.r.t. unique frame
attributes using the wipal-test-uniqueness command. This command finds collisions inside its input traces. You might specify different frame attributes using the -a option.
e.g.,
wipal-test-uniqueness -P -a tmp foo.pcap.1:foo.pcap.2
wipal-extract-unique-frames -P foo.pcap.1:foo.pcap.2 > foo-unique.txt
Special attributes
WiPal’s “standard” behavior only considers non-retransmitted IEEE 802.11 probe responses
and beacons to compute unique frames. Two “special” attributes however change this behavior:
hsh en2 These attributes only work with traces using the EN10MB pcap link type (Ethernet
II). On the opposite, “standard” attributes do not work with this link type. Using theses attributes tell WiPal to decapsulate Ethernet frames and use the following frames
as unique frames:
• OLSR packets (IPv4 and IPv6),
• IPv6 router advertisements.
Beware: if you use traces that last long, OLSR packets’ sequence numbers might wrap.
This might result in the assumption of OLSR packets being unique not holding anymore. In such cases, you cannot use WiPal to merge your traces.
hsh 80211 x These attributes work exactly as hsh 80211 but consider more frames as unique
frames. Non-retransmitted IEEE 802.11 probe responses and beacons are still considered unique frames. In addition, WiPal programs decapsulate IEEE 802.11 data frames
and consider the same frames as hsh en2 (previously described).
B.1. The programs
102
Note that these attributes are disabled by default (because it slows the compilation
down and for long traces they are less robust than hsh 80211). To enable them, use the
--enable-attributes option of WiPal’s configure script before compiling WiPal.
B.1.8 Duplicate data frames
One may use the wipal-find-data-dups command to search some invalid data frames. It
looks into traces on a per-sender basis for successive duplicate data frames (it only considers
non-retransmitted frames). Such cases should not occur in theory – as it ignores retransmissions, successive data frames from the same sender should at least show variations in their
sequence numbers. Surprisingly, some traces contain such anomalies: identical data frames
that are not retransmissions and are only spaced by a few milliseconds. We have no explanations why some datasets exhibit those phenomena.
e.g.,
wipal-find-data-dups foo.pcap.0:foo.pcap.1:foo.pcap.2
B.1.9 Statistics
wipal-stats computes several figures concerning its given input pcap traces. It displays
these figures as plain text on the standard output. You might either interpret them directly
or post-process them with some tools, e.g., to generate plots.
Most of the output figures are self-explanatory and therefore will not be mentioned in
this manual. Some others need an explanation though:
frames from expired senders The computation of some figures needs wipal-stats to keep
a state for each sender (e.g., its current sequence number). To avoid some measurement artifacts, each state expires after one minute of inactivity from its sender. This
counter indicates how many frames were received which sender had expired upon
reception of the frame.
sequence gap too large to make sense A sequence gap occurs every time a frame is received
which sequence number is greater than its sender’s previous sequence number plus
one. Theoretically, a gap of length N (e.g., receiving frame 42 and then frame 42 +
N + 1) means the sniffer missed N frames. Sometimes however the gap is too large to
make sense (e.g., a gap of 2000 within a window of 500 microseconds). WiPal counts
the number of occurrences of these gaps, but otherwise ignores them (e.g., when estimating the number of missed frames).
gap length frequencies This gives the frequencies of sequence gap lengths (see above). The
data is directly suitable for Gnuplot. Use the wipal-plot-gaplenfreqs script to generate the plot using Gnuplot. e.g.,
Appendix B. WiPal manual
103
wipal-stats foo.pcap > foo.stats
wipal-plot-gaplenfreqs foo.stats freqs.eps "A title"
T-Fi plot This gives data suitable for Gnuplot to generate a T-Fi plot. Use the wipal-plot-tfi
script to generate the plot using Gnuplot. e.g.,
wipal-stats foo.pcap > foo.stats
wipal-plot-tfi foo.stats tfi.eps "A title"
One may find an explanation about T-Fi plots in the following paper: On the fidelity of
802.11 Packet Traces, A. Schulman, D. Levin, and N. Spring, in the proceedings of PAM
2008.
BSS figures This gives a list of all BSSs the trace contains as well as a few other figures (e.g.,
number of distinct BSSs, APs and STAs corresponding to each BSS, etc.) The list is
ordered by number of beacons seen for each BSS.
SSID figures This gives the number of distinct SSIDs the trace contains as well as two lists
of these SSIDs. The first one orders them by frequency, the second one orders them
lexicographically.
activity This gives data that represents quantity of traffic w.r.t. elapsed time. Each line
correspond to one minute. Columns respectively represent:
1. how many frames were sent (during the corresponding minute),
2. how many bytes were sent,
3. how many bytes from management frames were sent,
4. how many bytes from data frames were sent.
5. how many bytes from access points were sent. When a STA emits a beacon which
is not belonging to an independent BSS (i.e., STA emits an infrastructure mode
beacon), WiPal identifies this STA as an access point. All further frames from this
STA are accounted as access point traffic.
One might use the wipal-plot-activity script to plot traffic rate w.r.t. elapsed time
for the whole trace, only for management frames, or only for access point frames. e.g.,
wipal-stats foo.pcap > foo.stats
wipal-plot-activity foo.stats activity.eps "A title"
B.1. The programs
104
Various growths (MAC addr., BSSID, IBSSID, SSID, AP) Actually each “growth” section
gives the same kind of statistics, but for various elements. Elements are:
MAC addr. MAC addresses, without BSSIDs or IBSSIDs. Inspect all frames.
BSSID BSSIDs that are not IBSSIDs. That is, independent BSS frames (i.e., ad hoc
mode frames) are ignored. Only inspect beacon frames, despite other frames also
contain BSSIDs.
IBSSID IBSSIDs. That is, only account independent BSS frames (i.e., ad hoc mode
frames). Also, only inspect beacon frames, despite other frames also contain IBSSIDs.
SSID All SSIDs. Only inspect beacon frames (e.g., ignore probe responses).
AP Sender MAC addresses from beacons. Account both normal BSS frames (infrastructure mode) and independent BSS frames (ad hoc mode).
For a given element type, “growth” data gives statistics about the evolution of the
number of distinct elements. Each row represents a minute of measurement. Columns
respectively represent:
1. The number of new distinct elements seen the last minute.
2. The total number of distinct elements seen since the beginning of the trace.
3. The number of distinct elements seen during the last minute.
For instance, if a trace contains the following elements:
first minute A B C
second minute A D
third minute A B D
The corresponding rows are:
3 3 3
1 4 2
0 4 3
One might use the wipal-plot-growth script to plot an element growth w.r.t. elapsed
time. e.g.,
wipal-stats foo.pcap > foo.stats
wipal-plot-growth "MAC addr." foo.stats mac-growth.eps "A title"
Appendix B. WiPal manual
105
ON/OFF events When a STA emits a frame, wipal-stats considers it as active. A STA’s
state gets back to inactive after three minutes of silence. The ON/OFF events section
lists these state’s changes. The section is composed of one subsection per STA and per
trace. Within these subsections, each line indicate a state change. A state change line
consists of two columns. The first one indicates the event’s timestamp, and the second
one the STA’s new state after the event (0 for inactive and 1 for active).
For instance:
begin ON/OFF T2 STA 00:00:00:00:00:42
0
1
60000000
0
end ON/OFF T2 STA 00:00:00:00:00:42
indicates that, within the third trace (first trace is referred as T0), STA 00:00:00:00:00:42
is active between timestamp 0 and timestamp 60000000.
One might use wipal-plot-onoff to generate a PDF file containing a visual representation of this section. Beware this script only gets installed if Python is present on your
system, and will only work with a proper Pycairo installation (http://cairographics.
org/pycairo/). e.g.,
wipal-stats foo.pcap > foo.stats
wipal-stats bar.pcap > bar.stats
wipal-plot-onoff foo.stats bar.stats > foo-bar-onoff.pdf
per STA counters For each IEEE 802.11 station, wipal-stats maintains various counters.
This section lists these counters. It is composed of several subsections which contain
the same information sorted differently (e.g., by traffic per STA, by activity periods
(“on time”), etc.)
Inside a given subsection, each row contains information about a peculiar station. Each
row has the following columns:
1. The MAC address of the station the row is about.
2. Total number of emitted bytes. This includes MAC frames and their payloads.
3. Average rate when on. That is, size/time on where size is the total number of
emitted bytes and time on the total duration the station is active (“on”) inside
traces. Values are in bytes per microseconds.
4. Total duration the station is active (“on”) inside traces. For instance, if a station
is active for 3 minutes somewhere at the beginning of the trace and then active
B.1. The programs
106
for 4 more minutes at another moment in the trace, this column holds 7 minutes.
Values are in microseconds.
5. Proportion of stations that have been printed so far. For instance, if the trace
contains 10 distinct stations, the first row’s value is 0.1, the second 0.2, etc. This is
useful for scripts that compute cumulated distributions.
6. Total number of bytes emitted, cumulated with previous rows. This is useful for
scripts that compute cumulated distributions.
7. Average rate when on, cumulated with previous rows. This is useful for scripts
that compute cumulated distributions.
8. Total number of frames emitted.
9. Average number of frames per microsecond when active. That is, count/time on
where count is the total number of frames emitted. time on has the same meaning
as above.
Three scripts use the “per STA counters” section: wipal-plot-t-dist, wipal-plot-t-c-dist,
and wipal-plot-ot-dist.
wipal-plot-t-dist Plots the distribution of traffic (and average rate when on) per STA.
wipal-plot-t-c-dist Plots cumulated distributions of traffic (and average rate when on)
per STA.
wipal-plot-ot-dist Plots the distribution of total activity periods (“on time”).
Plotting scripts
wipal-plot-all is a wrapper that that call all of WiPal’s plotting scripts. e.g.,
$ wipal-stats foo.pcap > foo.stats
$ wipal-plot-all foo.stats
$ ls
foo.pcap
foo.stats.I-growth.eps
foo.stats.gaplenfreqs.eps
foo.stats
foo.stats.M-growth.eps
foo.stats.tfi.eps
foo.stats.A-growth.eps
foo.stats.S-growth.eps
foo.stats.B-growth.eps
foo.stats.activity.eps
wipal-plot-activity and wipal-plot-growth use PCAP timestamps for the x axis.
Usually, PCAP timestamps use GMT. However, traces are not necessarily recorded in a GMT
zone. You might use the WP TZ environment variable to fix this. This variable specifies to
WiPal’s plot scripts a time adjustment in minutes.
e.g., if you recorded a trace in a GMT-4 zone, plot its statistics with:
WP TZ=$((-4 * 60)) wipal-plot-activity foo.stats
Appendix B. WiPal manual
107
B.1.10 Anonymization
wipal-anonymize is a program to anonymize IEEE 802.11 traces. It takes one input and one
output: the filename of the trace to anonymize, and the filename of the anonymous trace to
produce. The output contains the same frames as the input with the following modifications:
• NIC specific parts of MAC addresses are anonymized.
• ESSIDs are anonymized with a prefix-preserving scheme. For instance, a valid anonymization could map operator-4251, operator-DODO, and foobar to
abcdefgh*x0yz, abcdefgh*9876, and zxycba. The anonymization scheme also preserves character classes, i.e. alpha-numerical characters are anonymized to other alphanumerical characters, printable characters stay printable, and ASCII extended characters (128 to 256) stay extended.
• Data frames are truncated so the output only contains MAC headers.
wipal-anonymize stores valid-to-anonymous MAC and ESSID mappings into files so
these mappings can be re-used latter. wipal-anonymize also reads these file at start-up
when they exist. This enables the creation of distinct anonymous traces with consistent
MAC addresses and ESSIDs. By default these mapping files’ names are “MAC.map” and
“ESSID.map”. Use the -m and -s options to change this. See Section B.1.1.
B.1.11 Miscellaneous programs
wipal-list-frames just list a trace’s frames. This is a pretty dumb program, yet one may
use it to display a trace’s timestamps. e.g.,
$ wipal-list-frames foo.pcap | head
1
2
3
4
5
6
7
8
9
10
$ wipal-list-frames -C -U foo.pcap | head
foo.pcap
Frame ID
Microseconds
========
============
B.2. The library
108
1
1258703194
2
1258704299
3
1258704368
4
1258705143
5
1258709302
6
1258709362
7
1258709784
B.1.12 Undocumented programs
WiPal’s configure script has two options --enable-probe-stats and --enable-wit-import.
These options enable the build of several programs, namely wipal-probe-stats,
wit-create-datafiles, wit-create-tables-and-load-data, and wit-import. By default
the build of those programs is disabled.
Those are legacy programs that were useful to somebody once, yet are incomplete and
flawed. They will not be updated later, and are not documented here. Build and use at your
own risks!
B.2 The library
A C++ library also compose WiPal. WiPal programs all use this library. At a low level it
provides various convenience tools (pcap file input/output, random access to PCAP traces,
support for various static C++ techniques, etc.) At an upper level it provides a generic
IEEE 802.11 frame parser that is easy to customize and re-use. Finally, it provides various
mechanisms to synchronize and merge pcap traces directly from C++ code.
The library is called libwipal and its headers are located in $(prefix)/include/wipal.
You should be able to include them as follows:
#include <wipal/pcap/stream.hh>
// ...
You will then need to provide the -lwipal option to your compiling/linking tools.
The main documentation for this library is provided as a Doxygen documentation. It
should be installed into WiPal’s package data directory, into the “doxygen” subdirectory. By
default this gives “/usr/local/share/wipal/doxygen/”. This documentation is however a
bit messy, and lacks some parts. The best entry point to learn how to use the library is to look
at some of WiPal’s tools’ source code (e.g., into “src/misc/wipal-find-data-dups.cc”).
You may also want to have a look at WScout (http://wscout.lip6.fr/) which is another
program that uses WiPal (some versions of WScout embeds WiPal under the name tracetools).
Appendix B. WiPal manual
109
B.3 FAQ
What systems does WiPal support?
WiPal was mostly designed using standard C++ and
portable libraries. It however uses a few GCC extensions. Yet WiPal should run fine on most
systems (e.g., GNU/Linux, WhateverBSD, Mac OS, Windows, ...).
WiPal is however exclusively tested on Debian GNU/Linux (amd64 and, to a lower
extent, powerpc). Which means you might experience problems on other systems, which
the developers might not be aware of. In this case, please give feedback to them so they can
fix it. Anyway, there should be no major obstacle to WiPal’s portability.
What are WiPal’s requirements?
WiPal needs:
• GCC
• The Boost C++ libraries. More specifically:
– any,
– array,
– conversion/lexical cast,
– date time,
– filesystem,
– foreach,
– format,
– multi array,
– optional,
– preprocessor,
– smart ptr,
– string algo,
– tuple.
• The GNU MP Bignum Library,
• OpenSSL.
How do I install WiPal? WiPal’s packaging follows the GNU conventions. An installation
documentation is provided in the “INSTALL” file in the package’s root directory. However,
with a standard system, the following commands should do the trick:
B.3. FAQ
110
mkdir _build
cd _build
../configure
make
make install-strip
make check
On some systems, you might have to customize the “configure” script’s invocation.
e.g.,
mkdir _build
cd _build
../configure CPPFLAGS=-I/foo/bar/libgmp
make
make install-strip
make check
Are there any options to optimize WiPal when building it?
You might want to compile
WiPal with the NDEBUG preprocessor symbol defined. If you use GCC you might also want
to use its -O3 option. You can do that by running “configure” with the following options:
./configure CPPFLAGS=-DNDEBUG CXXFLAGS=-O3
Gee! WiPal’s compilation takes long and requires a lot of memory!
WiPal heavily uses
static C++ mechanisms and a full build requires instantiating many templates. This results
in a long build process that requires much memory. You may disable some template instantiations to have a faster and lighter build process. This will however remove some features
at the end. You may invoke configure with the following options:
–enable-linktypes=LT1:LT2:... will only enable the listed pcap link types when compiling
WiPal. The available link types are:
IEEE802 11 raw IEEE 802.11 frames,
IEEE802 11 RADIO Radiotap headers,
IEEE802 11 RADIO AVS AVS headers,
PRISM HEADER Prism headers,
EN10MB Ethernet II.
–enable-attributes=A1:A2:... will only enable the listed unique frame attributes (see Section B.1.7) when compiling WiPal. The list’s first attribute specifier is the default one
(when -a is not provided on the command line). Available attributes are:
Appendix B. WiPal manual
111
• tmp
• seq tmp
• dst tmp
• src tmp
• bss tmp
• src bss tmp
• seq bss tmp
• seq dst bss tmp
• seq src bss tmp
• hsh 80211 (requires OpenSSL)
• hsh 80211 x (requires OpenSSL)
• hsh en2 (requires OpenSSL)
If you know you are going to need only one pcap link type (e.g., Prism headers), and
you do not want to test various attributes, a good choice might be:
./configure --enable-linktypes=PRISM_HEADER --enable-attributes=hsh_80211
which will only instantiate one template configuration for each WiPal utility.
Do WiPal’s tools have a verbose mode to report extra information about their operation?
There is no such options that can be activated dynamically. You might want however to
compile WiPal with the WP ENABLE INFO preprocessor symbol defined. This will enable the
printing of some extra information in some tools as they run (e.g., number of processed
frames, synchronization error, etc.). Invoke the “configure” script with the following option:
./configure CPPFLAGS=-DWP_ENABLE_INFO
Note however that this may slow some tools down and may require more memory.
You say WiPal is flexible and customizable. Is there a way to customize WiPal’s tools
beyond the options they propose?
Yes! But this requires recompiling WiPal’s tools, and
sometimes modifying a few lines of their source code.
• You may change WiPal’s linear regression window (for trace synchronization) by defining the WP LRSYNC WINDOW SIZE macro symbol. Use the CPPFLAGS environment variable for this. The default value is 3. e.g.,
B.3. FAQ
112
./configure CPPFLAGS=’-DWP_LRSYNC_WINDOW_SIZE=42’
• You may change the windowed merging algorithm’s window size by defining the
WP WMERGE WINDOW SIZE macro symbol. Use the CPPFLAGS environment variable for
this. The default value is 3. e.g.,
./configure CPPFLAGS=’-DWP_WMERGE_WINDOW_SIZE=42’
• You may change the frame attributes (i.e., frame identifiers) to use in tools that do not
support the -a option by modifying a few lines of their source code. This generally
needs changing an include and a typedef, e.g.,
-#include <wipal/wifi/unique_id/seqctl_bssid_timestamp.hh>
+#include <wipal/wifi/unique_id/seqctl_source_bssid_timestamp.hh>
// ...
-
typedef wifi::seq_bss_tmp_id
unique_id;
+
typedef wifi::seq_src_bss_tmp_id
unique_id;
“configure” complains it did not find library X? Either library X is not installed on your
system, either your system is not properly configured, so the library cannot be found.
You may use the CPPFLAGS and LDFLAGS variables to correct this behavior.
e.g., run
./configure CPPFLAGS=-I/custom/path/include
LDFLAGS=-L/custom/path/lib
“configure” complains it found library X’s headers, but is unable to link?
Most prob-
ably library X is installed but its binaries are in a non-standard place. Use the LDFLAGS
variable as described previously.
“configure” complains library X’s headers are unusable, despite successful linking? Most
probably library X is installed but its headers are in a non-standard place. Use the CPPFLAGS
variable as described previously.
Appendix B. WiPal manual
Do you have a list of WiPal’s bugs?
113
No. We are not aware of any serious bug in WiPal. We
take a special care at testing WiPal with an automated test suite. Do not hesitate to report
unknown bugs to the package’s maintainers. We will hunt them.
With some tools, you might however encounter some strange behaviors when providing invalid inputs (e.g., running wipal-find-data-dups a:b with “b” having a link type
different from “a”). Consider that as a “feature”! ;-)
I have found a bug, what should I do?
Report it to [email protected], the pack-
age’s maintainer.
I would really love having feature X implemented!
Give feedback to the package’s main-
tainers about the features you want. We might not have the time to implement them, yet it
is important for us to know when important features are missing.
Regarding features you miss, you are greatly encouraged to contribute to WiPal. Again,
contact the package’s maintainers so they can help you implement new features.
I have a question this document did not answer!
Mail [email protected].
114
B.3. FAQ
Appendix C
List of publications
C.1 Journals
Thomas Claveirole and Marcelo Dias de Amorim, WiPal: Efficient Offline Merging of IEEE
802.11 Traces, to appear in the ACM Mobile Computing and Communications Review, 2010.
Thomas Claveirole, Marcelo Dias de Amorim, Michel Abdalla, and Yannis Viniotis, Securing
Wireless Sensor Networks Against Aggregator Compromises, IEEE Communications Magazine,
vol. 46, no. 4, pp. 134-141, April 2008.
C.2 Conferences
Thomas Claveirole, Mathias Boc, and Marcelo Dias de Amorim, An Empirical Analysis of
Wi-Fi Activity in Three Urban Scenarios, IEEE PerCom Workshop on Pervasive Wireless Networking, March 2009.
Thomas Claveirole, Marcelo Dias de Amorim, Michel Abdalla, and Yannis Viniotis, Résistance
contre les attaques par capture dans les réseaux de capteurs, Journées Doctorales en Informatique
et Réseaux, January 2007. Awarded best paper.
Alia Fourati, Khaldoun Al Agha, and Thomas Claveirole, Securing OLSR Routes, Asian Internet Engineering Conference, December 2005.
Thomas Claveirole, Sylvain Lombardy, Sarah O’Connor, Louis-Noël Pouchet, and Jacques
Sakarovitch, Inside Vaucanson, International Conference on Implementation and Application
of Automata, June 2005.
C.3 Demos and posters
Thomas Claveirole, Marcelo Dias de Amorim, and Serge Fdida, Sniffing IEEE 802.11 Mobility, AdHoc-NOW PhD Workshop, September 2008.
115
116
C.4. Software
Thomas Claveirole and Marcelo Dias de Amorim, WiPal and WScout, Two Hands-on Tools
for Wireless Packet Traces Manipulation and Visualization (demo), ACM Mobicom Workshop
on Wireless Network Testbeds, Experimental Evaluation and Characterization, September
2008.
Thomas Claveirole, Marcelo Dias de Amorim, Michel Abdalla, and Yannis Viniotis, Resisting Against Aggregator Compromises in Sensor Networks (poster), ACM CoNext Student
Workshop, December 2006.
C.4 Software
WiPal , IEEE 802.11 trace manipulation software, http://wipal.lip6.fr/, published on
CRAWDAD [2] : http://www.crawdad.org/meta.php?name=tools/process/pcap/WiPal.
WScout , Lightweight pcap trace trace visualizer, http://wscout.lip6.fr/, published on
CRAWDAD [2] : http://www.crawdad.org/meta.php?name=tools/analyze/pcap/WScout.
C.5 Under review
Thomas Claveirole and Marcelo Dias de Amorim, Manipulating Wi-Fi Packet Traces with WiPal: Design and Experience, submitted to Wiley Software Practice and Experience, 2010.
Thomas Claveirole and Marcelo Dias de Amorim, On the Completeness of Wireless Packet
Sniffing, submitted to IEEE Communications Letter (second round).
Ryad Ben-El-Kezadri, Giovanni Pau, and Thomas Claveirole, TurboSync: Clock Synchronization for Shared Media Networks via Principal Component Analysis with Missing Data, submitted
to WiOpt: Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless
Networks, June 2010.
Bibliography
[1] Boost. http://www.boost.org/. Free peer-reviewed portable C++ source libraries.
[2] CRAWDAD data sets. http://www.crawdad.org/data.php. Community Resource for
Archiving Wireless Data At Dartmouth.
[3] The GNU multiple precision arithmetic library. http://gmplib.org/. Free library for
arbitrary precision arithmetic.
[4] Radiotap. http://www.radiotap.org/. Standard for 802.11 frame injection and reception.
[5] Scapy. http://www.secdev.org/projects/scapy/. Interactive packet manipulation
program.
[6] Wireshark. http://www.wireshark.org/. Network protocol analyzer.
[7] libpcap. http://www.tcpdump.org/. Packet Capture library.
[8] tcpdump. http://www.tcpdump.org/. Protocol packet capture and dumper program.
[9] Mikhail Afanasyev, Tsuwei Chen, Geoffrey M. Voelker, and Alex C. Snoeren. Analysis
of a mixed-use urban WiFi network: When metropolitan becomes neapolitan. In IMC:
ACM SIGCOMM/USENIX Internet Measurement Conference, pages 85–98, Vouliagmeni,
Greece, 2008. ISBN 978-1-60558-334-1.
[10] Ian F. Akyildiz, Xudong Wang, and Weilin Wang. Wireless mesh networks: a survey.
Computer Networks, 47(4):445–487, 2005. ISSN 1389-1286.
[11] Mark Alllman and Vern Paxson. Issues and etiquette concerning use of shared measurement data. In IMC: ACM SIGCOMM/USENIX Internet Measurement Conference, pages
135–140, San Diego, California, USA, 2007. ISBN 978-1-59593-908-1.
[12] Paramvir Bahl, Ranveer Chandra, Jitendra Padhye, Lenin Ravindranath, Manpreet
Singh, Alec Wolman, and Brian Zill. Enhancing the security of corporate Wi-Fi networks using DAIR. In MobiSys: ACM/USENIX International Conference on Mobile Systems, Applications, and Services, Uppsala, Sweden, June 2006.
[13] Magdalena Balazinska and Paul Castro. Characterizing mobility and network usage
in a corporate wireless local-area network. In MobiSys: ACM/USENIX International
Conference on Mobile Systems, Applications, and Services, San Francisco, California, USA,
May 2003.
[14] Kemal Bicakci and Bulent Tavli. Denial-of-service attacks and countermeasures in IEEE
802.11 wireless networks. Computer Standards & Interfaces, 31(5):931–941, 2009. ISSN
0920-5489.
117
118
Bibliography
[15] Mathias Michael Boc. Profile of Mobility: User-centric Networking. PhD thesis, Université
Pierre et Marie Curie, Paris, France, November 2009.
[16] Nicolas Burrus, Alexandre Duret-Lutz, Thierry Géraud, David Lesage, and Raphaël
Poss. A static C++ object-oriented programming (SCOOP) paradigm mixing benefits
of traditional oop and generic programming. In Workshop on Multiple Paradigm with OO
Languages (MPOOL’03), Anaheim, California, USA, October 2003.
[17] Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, and Michael Stal.
Pattern-Oriented Software Architecture: A System of Patterns, volume 1. John Wiley and
Sons Ltd., 1996.
[18] C++0x. ISO Working Draft, Standard for Programming Language C++ , September 2009.
Document number: N3000=09-0190.
[19] A. Chaintreau, P. Hui, C. Diot, R. Gass, and J. Scott. Impact of human mobility on
opportunistic forwarding algorithms. IEEE Trans. Mobile Comput., 6(6):606–620, June
2007.
[20] Ranveer Chandra, Jitendra Padhye, Alec Wolman, and Brian Zill. A location-based
management system for enterprise wireless LANs. In NSDI: ACM/USENIX Symposium on Networked Systems Design and Implementation, pages 115–130, Cambridge, Massachusetts, USA, 2007.
[21] Yu-Chung Cheng, Yatin Chawathe, Anthony LaMarca, and John Krumm. Accuracy
characterization for metropolitan-scale Wi-Fi localization. In MobiSys: ACM/USENIX
International Conference on Mobile Systems, Applications, and Services, pages 233–245, Seattle, Washington, USA, 2005. ISBN 1-931971-31-5.
[22] Yu-Chung Cheng, John Bellardo, Péter Benkö, Alex C. Snoeren, Geoffrey M. Voelker,
and Stefan Savage. Jigsaw: Solving the puzzle of enterprise 802.11 analysis. In ACM
SIGCOMM Conference, pages 39–50, Pisa, Italy, September 2006. ISBN 1-59593-308-5.
[23] Yu-Chung Cheng, Mikhail Afanasyev, Patrick Verkaik, Péter Benkö, Jennifer Chiang,
Alex C. Snoeren, Stefan Savage, and Geoffrey M. Voelker. Automating cross-layer diagnosis of enterprise wireless networks. In ACM SIGCOMM Conference, pages 25–36,
Kyoto, Japan, August 2007. ISBN 978-1-59593-713-1.
[24] Thomas Claveirole. WScout. http://wscout.lip6.fr/, . Lightweight pcap trace visualizer.
[25] Thomas Claveirole. WiPal. http://wipal.lip6.fr/, . IEEE 802.11 Trace Manipulation
Software.
[26] Thomas Claveirole and Marcelo Dias de Amorim. WiPal: Efficient offline merging of
IEEE 802.11 traces. To appear in the ACM Mobile Computing and Communications Review,
2010. Draft available at http://www.citebase.org/abstract?id=oai:arXiv.org:
0806.4526.
[27] Falko Dressler. A study of self-organization mechanisms in ad hoc and sensor networks. Computer Communications, 31(13):3018–3029, 2008. ISSN 0140-3664.
[28] Diego Dujovne. WisMon: A wireless network statistical tool. Technical report, INRIA
Sophia Antipolis, October 2006.
Bibliography
119
[29] Diego Dujovne, Thierry Turletti, and Fethi Filali. A taxonomy of IEEE 802.11 wireless
parameters and open source measurement tools. IEEE Commun. Surveys Tuts., 2009.
[30] David Eckhardt and Peter Steenkiste. Measurement and analysis of the error characteristics of an in-building wireless network. In ACM SIGCOMM Conference, pages
243–254, Palo Alto, California, United States, 1996. ISBN 0-89791-790-1.
[31] Jeremy Elson, Lewis Girod, and Deborah Estrin. Fine-grained network time synchronization using reference broadcasts. In NSDI: ACM/USENIX Symposium on Networked
Systems Design and Implementation, pages 147–163, Boston, Massachusetts, USA, 2002.
[32] Erich Gamma, Richard Helm, Ralph Johnson, and John M. Vlissides. Design Patterns:
Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, 1994. ISBN
0-201-63361-2.
[33] Marta C. González, César A. Hidalgo, and Albert-László Barabási. Understanding individual human mobility patterns. Nature, (453):779–782, June 2008.
[34] K. N. Gopinath, Pravin Bhagwat, and K. Gopinath. An empirical analysis of heterogeneity in IEEE 802.11 MAC protocol implementations and its implications. In WiNTECH: ACM SIGMOBILE International Workshop on Wireless Network Testbeds, Experimental Evaluation and CHaracterization, Los Angeles, California, USA, September 2006.
[35] Tristan Henderson, David Kotz, and Ilya Abyzov. The changing usage of a mature
campus-wide wireless network. Computer Networks, 52(14):2690–2712, October 2008.
ISSN 1389-1286.
[36] Pan Hui, Augustin Chaintreau, James Scott, Richard Gass, Jon Crowcroft, and
Christophe Diot. Pocket switched networks and human mobility in conference environments. In WDTN: ACM SIGCOMM Workshop on Delay Tolerant Networking and
Related Topics, pages 244–251, Philadelphia, Pennsylvania, USA, August 2005. ISBN
1-59593-026-4.
[37] IEEE 802.11. IEEE Standard for Information Technology — Telecommunications and Information Exchange Between Systems — Local and Metropolitan Area Networks — Specific requirements — Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specifications, June 2007. IEEE Std 802.11-2007 (Revision of IEEE Std 802.11-1999).
[38] Amit P. Jardosh, Krishna N. Ramachandran, Kevin C. Almeroth, and Elizabeth M.
Belding-Royer. Understanding congerstion in IEEE 802.11b wireless networks. In IMC:
ACM SIGCOMM/USENIX Internet Measurement Conference, Berkeley, California, USA,
October 2005.
[39] Amit P. Jardosh, Krishna N. Ramachandran, Kevin C. Almeroth, and Elizabeth M.
Belding-Royer. Understanding link-layer behavior in highly congested IEEE 802.11b
wireless networks. In E-WIND: ACM SIGCOMM Workshop on Experimental Approaches
to Wireless Network Design and Analysis, pages 11–16, Philadelphia, Pennsylvania, USA,
August 2005. ISBN 1-59593-026-4.
[40] Stephen C. Johnson. Yacc: Yet another compiler-compiler. Technical report, AT&T Bell
Laboratories, 1979.
[41] T. Karagiannis, J.Y. Le Boudec, and M. Vojnović. Power law and exponential decay
of inter contact times between mobile devices. In MobiCom: ACM SIGMOBILE Annual
International Conference on Mobile Computing and Networking, Montréal, Québec, Canada,
September 2007.
120
Bibliography
[42] Minkyong Kim, David Kotz, and Songkuk Kim. Extracting a mobility model from
real user traces. In INFOCOM: IEEE Conference on Computer Communications, Barcelona,
Spain, April 2006.
[43] Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Analyzing the
MAC-level behavior of wireless networks in the wild. In ACM SIGCOMM Conference,
pages 75–86, Pisa, Italy, September 2006. ISBN 1-59593-308-5.
[44] Ratul Mahajan,
Maya Rodrig,
and John Zahorjan.
CRAWDAD
tool
tools/analyze/802.11/wit
(v.
2006-09-29).
Downloaded
from
http://crawdad.cs.dartmouth.edu/tools/analyze/802.11/Wit, September 2006.
[45] Mohammad Hossein Manshaei, Mathieu Lacage, Ceilidh Hoffmann, and Thierry
Turletti. On selecting the best transmission mode for wifi devices. Wirel. Commun.
Mob. Comput., 9(7):959–975, 2009. ISSN 1530-8669.
[46] Salim Nahle and Naceur Malouch. Graph-based approach for enhancing capacity and
fairness in wireless mesh networks. In GLOBECOM: IEEE Global Communications Conference, Honolulu, Hawaii, USA, November 2009.
[47] Ruoming Pang, Vern Paxson, Robin Sommer, and Larry Peterson. binpac: a yacc for
writing application protocol parsers. In IMC: ACM SIGCOMM/USENIX Internet Measurement Conference, pages 289–300, Rio de Janeriro, Brazil, 2006. ISBN 1-59593-561-4.
[48] Maxim Raya, Imad Aad, Jean-Pierre Hubaux, and Alaeddine El Fawal. DOMINO: Detecting MAC layer greedy behavior in IEEE 802.11 hotspots. IEEE Trans. Mobile Comput.,
5:1691–1705, 2006. ISSN 1536-1233.
[49] Maya Rodrig, Charles Reis, Ratul Mahajan, David Wetherall, and John Zahorjan.
Measurement-based characterization of 802.11 in a hotspot setting. In E-WIND: ACM
SIGCOMM Workshop on Experimental Approaches to Wireless Network Design and Analysis,
pages 5–10, Philadelphia, Pennsylvania, USA, August 2005. ISBN 1-59593-026-4.
[50] Maya Rodrig, Charles Reis, Ratul Mahajan, David Wetherall, John Zahorjan, and Ed Lazowska. CRAWDAD data set uw/sigcomm2004 (v. 2006-10-17). Downloaded from
http://crawdad.cs.dartmouth.edu/uw/sigcomm2004, October 2006.
[51] Bilel Ben Romdhanne, Diego Dujovne, and Thierry Turletti. Efficient and scalable merging algorithms for wireless traces. ROADS: Workshop on Real Overlays and Distributed
Systems, October 2009.
[52] Björn Scheuermann, Wolfgang Kiess, Magnus Roos, Florian Jarre, and Martin Mauve.
On the time synchronization of distributed log files in networks with local broadcast
media. IEEE/ACM Trans. Netw., 17(2):431–444, April 2008. ISSN 1063-6692.
[53] Aaron Schulman, Dave Levin, and Neil Spring. On the fidelity of 802.11 packet traces.
In PAM: Passive and Active Measurement Conference, pages 132–141, Cleveland, Ohio,
USA, April 2008.
[54] Pablo Serrano, Michael Zink, and Jim Kurose. Assessing the fidelity of COTS 802.11
sniffers. In INFOCOM: IEEE Conference on Computer Communications, pages 1089–1097,
April 2009.
[55] Douglas C. Sicker, Paul Ohm, and Dirk Grunwald. Legal issues surrounding monitoring during network research. In IMC: ACM SIGCOMM/USENIX Internet Measurement
Conference, pages 141–148, San Diego, California, USA, 2007. ISBN 978-1-59593-908-1.
Bibliography
121
[56] Libo Song, David Kotz, Ravi Jain, and Xiaoning He. Evaluating location predictors
with extensive Wi-Fi mobility data. In INFOCOM: IEEE Conference on Computer Communications, Hong Kong, China, March 2004.
[57] VeriWave. WaveTest multi client traffic generator / performance analyzer. http://
veriwave.com/products/wavetest_90_20.asp, 2004.
[58] Mark Weiser. The computer for the twenty-first century. Scientific American, pages
94–10, September 1991.
[59] Jihwang Yeo, Moustafa Youssef, and Ashok Agrawala. A framework for wireless LAN
monitoring and its applications. In WiSe: IEEE International Workshop on Wireless &
Internet Services, Philadelphia, Pennsylvania, USA, October 2004.
[60] Jennifer Yick, Biswanath Mukherjee, and Dipak Ghosal. Wireless sensor network survey. Computer Networks, 52(12):2292–2330, 2008. ISSN 1389-1286.
[61] Moustafa Youssef, Matthew Mah, and Ashok Agrawala. Challenges: Device-free passive localization for wireless environments. In MobiCom: ACM SIGMOBILE Annual
International Conference on Mobile Computing and Networking, pages 222–229, Montréal,
Québec, Canada, 2007. ISBN 978-1-59593-681-3.
122
Bibliography
List of Figures
1.1
2.1
2.2
2.3
2.4
2.5
3.1
3.2
3.3
3.4
4.1
4.2
4.3
4.4
4.5
Wireless sniffing: passive monitors listen to the wireless activity inside the
measurement area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
WiPal’s structure and modules. . . . . . . . . . . . . . . . . . . . . . . . . . . .
A complex filter example. This figure shows how WiPal uses filters to synchronize and merge two IEEE 802.11 traces. Each box represents a filter and
arrows show pipes. Pipes convey different types of data. . . . . . . . . . . . .
A screenshot of WScout [24] . WScout uses WiPal’s random access feature to
open packet traces that do not fit in memory. . . . . . . . . . . . . . . . . . . .
A simple processing pipeline using two filters (represented as white boxes).
Listing 2.5 displays the code implementing this pipeline. . . . . . . . . . . . .
Mean execution time for a hundred runs of the various test programs. Note
that most 95% confidence intervals are too small to be distinguished clearly. .
12
Merging two traces T1 and T2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The structure of a merge process in WiPal. . . . . . . . . . . . . . . . . . . . . .
Synchronization difference w.r.t. linear regression window size. The upper
curve represent average, minimum, and maximum values for seven of the
eight merges. The lower curve represent the result for the other one, and is
plotted separately because it has a singular shape. We think this is related to
the timestamping accuracy of the input traces for this merge. . . . . . . . . . .
Number of frames detected as shared by both input traces w.r.t. linear regression window size. The curve represents the average, minimum and maximum values for eight merge operations. For each merge operation, this number is normalized using 1 as the number of frames from the window size that
gives the highest value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
31
ASUS EeePC 700 with three Netgear WG111v3, as used for trace collection. .
Number of MAC addresses each merged trace contains from its beginning to
a given time. Contrary to table 5.1, which only accounts MAC addresses from
frames’ sender fields, all fields containing valid MAC addresses are used. . .
“Score” of a single merge operation. m N is the last merge, i.e., the one that
includes N sniffers. Note that when k > 2, mk−1 features frames from at least
two distinct sniffer traces and thus it is expected that o2 > o1 . Therefore in
most cases score(mk ) = |mo1 | . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
N
Successive computations of Mk for N = 4. An arrow from x to y symbolizes
the x ? y merge operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Evolution of scores w.r.t. the number of monitors. . . . . . . . . . . . . . . . .
45
123
13
18
20
24
35
36
47
49
50
51
124
4.6
5.1
5.2
5.3
5.4
5.5
List of Figures
Scores w.r.t. number of monitors and dataset. Each column represents a given
channel of a specific dataset. Each row Mk represents the set of sub-datasets
of size k. Each cell contains a box whose size is proportional to the average
number of packets inside the corresponding sub-datasets. Red (dark) parts
of boxes represent average values of o1 (see Figure 4.3). Pink parts (medium
grey) represent average values of o2 . Numbers below boxes are average scores
(in percents) with 95% confidence intervals. . . . . . . . . . . . . . . . . . . . .
Distributions of cumulated activity durations. . . . . . . . . . . . . . . . . . .
Number of distinct MAC addresses each trace contains from its beginning to
a given time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CCDFs of aggregated inter-activity times of all devices for the three traces.
The distributions are well fitted by truncated power laws with exponential
decays. The parameters of the distributions are presented in the text. . . . . .
Proportion of users that are active each time interval relatively to the first
time (interval) they appeared for the three traces. In these traces, we observe
a clear periodicity of 24 hours with some variations that are characteristic of
the social meaning of each environment. . . . . . . . . . . . . . . . . . . . . . .
Sniffer locations regarding the collection of traces inside the Parc Monceau.
The subsequent trace analysis is currently in progress. (Background from
Google Maps.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.1 Sniffing sans fils : des moniteurs passifs écoutent l’activité radio au sein de la
zone de mesure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 L’architecture et les modules de WiPal. . . . . . . . . . . . . . . . . . . . . . . .
A.3 Fusion de deux traces T1 et T2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.4 L’architecture du processus de fusion de WiPal. . . . . . . . . . . . . . . . . . .
A.5 Un ASUS EeePC 700 avec trois adaptateurs Wi-Fi USB Netgear WG111v3 tel
qu’utilisé pour la collection de nos traces. . . . . . . . . . . . . . . . . . . . . .
A.6 Distribution des durées d’activité cumulées, pour chaque trace et pour chaque
station. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.7 Nombre d’adresse MAC distinctes que contient chaque trace entre le début
de la mesure et un temps donné. . . . . . . . . . . . . . . . . . . . . . . . . . .
A.8 Position des moniteurs pour la collection de traces dans le parc Monceau. Le
travail d’analyse des traces est en cours. (Arrière plan : Google Maps.) . . . .
52
58
60
61
63
65
74
79
80
82
83
85
87
89
List of Tables
3.1
Characteristics of the traces used for testing merge operations. Id. relates to
the identification number of the merge operations. . . . . . . . . . . . . . . . .
34
4.1
Quantitative characteristics of the 2008-12-01 and 2008-12-19 datasets. . . .
46
5.1
Quantitative characteristics of the Office, Residential sparse, and Residential
dense traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
125
126
List of Tables
Listings
2.1
A sample program using WiPal. This program prints the type of every IEEE 802.11
frame included in file.pcap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The program of listing 2.1 with support for multiple PHY headers. . . . . . . 14
2.3 A typical example of packet processing code. The code is error-prone, depends on the whole protocol stack, and does not handle truncated frames. . . 15
2.4 A program using WiPal’s IEEE 802.11 parser. It uses the same main() function
as listing 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 An example of advanced trace processing using filters. This program uses the
same main() function as listing 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . 21
A.1 Un exemple de programme qui utilise la bibliothèque de WiPal. Ce programme affiche le type de chaque trame IEEE 802.11 qui compose file.pcap. 78
127
128
Listings