Branche : SRT / SSI Année : 2014 Semestre : A14 Titre du st

Transcription

Branche : SRT / SSI Année : 2014 Semestre : A14 Titre du st
Nom, prénom : BELLISSIME Ferdinand
Branche : SRT / SSI
Responsable pédagogique UTT :
Année :
2014
Semestre : A14
Titre du stage
STAGE A LA BUNDESKRIMINALAMT : PARTICIPER A LA CONSTRUCTION DE MALANET,
UN SYSTEME AUTOMATISE D'ANALYSE DE MALWARES
Résumé
C’est dans l’unité KI-42 de la Bundeskriminalamt (police criminelle fédérale allemande) à Berlin que s’est
déroulé ce stage. Principalement tournée vers l’investigation numérique réseau, celle-ci fournit également du
support dans l’analyse de malwares. C’est dans le contexte actuel du développement de la criminalité
informatique et d’une augmentation de l’utilisation de malwares que celle-ci travaille sur un système automatisé
d’analyse s’appuyant sur des simulations. Deux sous-projets ont été recherchés et développés dans le cadre de ce
stage : le déploiement et l’intégration de machines virtuelles pour compléter le parc de cibles réelles, ainsi que la
mise au point d’un générateur de malwares non-offensifs afin de tester le bon fonctionnement et les limites du
système. Si les machines virtuelles furent principalement du développement et du déploiement, le générateur
demanda un aspect plus théorique pour concevoir un langage intermédiaire entre programmation et description
suivi de la mise au point d’un prototype. Au-delà du travail technique, une réflexion de fond sur la criminalité, la
technologie et la société a accompagné le stage et a été retranscrite dans le rapport.
Entreprise :
Bundeskriminalamt
Lieu :
Berlin, Allemagne
Responsable : Herr Thomas Schwarz
Mots clefs
Recherche appliquée, développement
Fonction publique
Informatique
Sécurité des systèmes
Acknowledgements
I would like to express my sincere gratitude and apprecitation to the people that provided me
with the opportunities and means to advance in life, whether the help was found in social, technical,
philosophical, economical or casual matters. Including, but not limited to, the following persons :
• the Bundeskriminalamt,
for the opportunity to work amongst its people,
• the Université de technologie de Troyes,
for the opportunity to practice my capacities through internships,
• my agency-side supervisor, the Kriminaloberkommissar Thomas Schwarz,
for its patience and support,
• my university-side supervisor, the Associate Professor Patrick Lallement,
for its motivated and thorough work,
• the Leitender Kriminaldirektor Helmut Ujen,
for offering me the opportunity to work with the BKA in the first place,
• the UTT Relations Formation-Entreprises service,
for its help and understanding,
• the whole KI-42 unit in Berlin, and our lovely neighbour, the KI-22 unit,
for the work environment, for tolerating my unusual behaviour and ideas,
• the giants on whose shoulders I am standing on,
those who worked on the potential of information machines and systems,
those who played with self-reproducing agents and parasitic software,
those who theorized simulation-based analysis,
those who provided virtualization technologies,
• the flow of information on the Internet,
from the humblest answer to the biggest aggregation of knowledge,
for providing tools and means for self-teaching,
for acting as a live repository of human knowledge and thoughts,
• Thanks for the coffee, Dieter.
• Love you Diane.
Ouverture
Introduction
This paper is written in the context of a final internship of both an engineering formation (TN10)
and a Master study (TN30) at the UTT, focused on information security (engineering department
Systèmes, Réseaux et Télécommunications, specialisation Sécurité des Systèmes et des Communications; master Sciences, Technologies et Santé, mention Sciences et Technologies de l’Information
et de la Communication, specialisation Sécurité des Systèmes d’Information).
The internship took place at the Bundeskriminalamt office in Berlin, for the KI-42 unit, from
September the 1st in 2014 to February the 27th in 2015. It was conducted by the intern Roland
Ferdinand (usual name) Loup Bellissime, under the supervision of the Kriminaloberkommissar
Thomas Schwarz and the Associate Professor Patrick Lallement.
This final paper acts as a presentation for examination purposes for both the research and development done regarding computer crime, simulation-based analysis for malwares, introspectionoriented analysis through test samples, in the context of participating to the continuous evolution
of the in-house MALANET experiment.
List of abbreviations
• AV: Anti-Virus
• IT: Information Technology
• BKA: Bundeskriminalamt
• C&C: Command & Control
• KI: Kriminalistisches Institut
• DLL: Dynamic Link Library
• LKA: Landeskriminalamt
• R&D: Research & Development
• RE: Reverse-Engineering
• UTT: Université de Technologie de Troyes
Disclaimer
Due to the continuous R&D state of the analysis system, including the sample factory and the
homebrew pseudo-language (both of which may turn open-source one day), neither code examples
nor design decisions may be up-to-date. Due to the sensitive nature of some operational work, the
help provided regarding low-sensitive cases and casual operational work is not described thoroughly
in this report.
"Man verdirbt einen Jüngling am sichersten, wenn man ihn anleitet,
den Gleichdenkenden höher zu achten, als den Andersdenkenden."
Friedrich Nietzsche
Contents
1 Pardon my French: résumé du rapport
2 The
2.1
2.2
2.3
1
Bundeskriminalamt and the rise of computer crimes
10
BKA: a federal agency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
From handling communications to computer crimes . . . . . . . . . . . . . . . . . . . 10
KI-42: network forensics, wiretap analysis, malwares . . . . . . . . . . . . . . . . . . 11
3 Malware forensics
12
3.1 Know your enemy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Know your options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Know your story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 MALANET: an analysis environment, tools included
18
4.1 Assisting the human with an automated system . . . . . . . . . . . . . . . . . . . . . 18
4.2 Components of an analysis system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 The MALANET solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Analysis environment: deploying virtual targets
22
5.1 Going virtual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Implementing Cuckoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Virtual targets inside MALANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 System introspection: crafting
6.1 Analysing an analysis system
6.2 Crafting a sample factory . .
6.3 Sharing the good stuff . . . .
a
.
.
.
test
. . .
. . .
. . .
sample
. . . . .
. . . . .
. . . . .
7 Law and technological (dis)Order
7.1 Computer crime and computer police, a brave
7.2 Ex machina: the assistant in the machine . .
7.3 Viruses, beyond good and evil . . . . . . . . .
7.4 Technological evolution and societal evolution
factory
33
. . . . . . . . . . . . . . . . . . . . . . 33
. . . . . . . . . . . . . . . . . . . . . . 34
. . . . . . . . . . . . . . . . . . . . . . 42
new world
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
44
44
45
45
46
A Extracts from the Cuckoo guide
48
B Cuckoo API and Python : examples
50
C Decoding botnet message : examples
51
D The Architect : examples
53
Bibliography
56
Chapter 1
Pardon my French: résumé du rapport
La Bundeskriminalamt et la montée du crime informatique
La BKA, une agence federale L’Allemagne, pays fédéral construit autour d’un ensemble de
länders, dispose d’une agence de police criminelle à un niveau fédéral, la Bundeskriminalamt (BKA).
Tout en servant de relais entre les agences régionales (les LKA) ainsi qu’avec les autres agences
nationales ou supranationales, la juridiction de la BKA s’étend en théorie aux crimes commis
sur plusieurs régions ou ayant un impact national. Dans les faits, en raison de ses moyens plus
conséquents que les LKA, la BKA prend en charge différents types de crimes particuliers : analyse
légale de cas exceptionnels, renseignement criminel, contre-terrorisme, et la majorité du crime
informatique. Afin de couvrir l’entièreté du territoire allemand, la BKA dispose de trois sièges
principaux situes à Wiesbaden, à Berlin ainsi qu’à proximité de Bonn. Pour des raisons historiques,
le siège principal se trouve à Wiesbaden.
De s’occuper des télécommunications à répondre à la criminalité informatique La BKA
débuta ses activités vis-à-vis des technologies de l’information au travers des télécommunications
employées au cours de la guerre froide par les agents étrangers sur sol ouest-allemand. L’apparition
puis la démocratisation des ordinateurs personnels ont certes amené une évolution dans les méthodes
de communication des agents étrangers mais aussi dans la criminalité en général (autant pour
faciliter la communication, la logistique, la planification, que pour adapter des crimes au contexte
des systèmes informatisés). Si les premiers concernés furent des enthousiastes qui repoussèrent des
limites encore mal comprises, la situation aujourd’hui intègre les différents éléments de la criminalité
courante ainsi que des cas inédits.
Pour répondre à cette évolution, d’un côté les unités d’enquête intégrèrent de nouveaux outils et
de nouvelles méthodes afin de s’adapter au changement pratique sur le terrain et dans les éléments
d’enquête, tandis que d’un autre côté des unités de soutien spécialisées furent mises en place. Si le
soutien passe par l’apport d’une expertise technique aux enquêteurs, comme par la maîtrise d’une
technologie particulière (e.g. : écoutes), il passe aussi par des unités tournées vers l’investigation
technique et l’analyse légale. Tandis que les premières servent à acquérir plus d’information en
s’intégrant directement dans le processus d’enquête, les secondes s’orientent vers la reconstruction
des événements ainsi que la recherche de preuves au sein de l’information acquise.
1
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
2
KI-42 : investigation réseau, analyse d’écoutes, programmes malicieux L’unité KI-42 est
issue d’une ancienne unité d’investigation informatique, au même titre que la KI-22, unité tournée
vers l’analyse du matériel informatique ainsi que des systèmes. KI-42 obtint l’investigation réseau
ainsi que l’analyse de données récupérées lors d’écoutes, un grand nombre de compétences communes
existant entre les deux. Concernant l’unité présente à Berlin, celle-ci obtint également l’analyse de
programmes malicieux, tâche qui lui était attribuée autrefois, désormais mission supplémentaire.
Ce choix d’orientation s’inscrit dans l’évolution récente du crime informatique. Si aujourd’hui le
sujet global est devenu plus évident pour tout un chacun, le grand public est plus sensible aux cas
d’intrusion, de déni de service ou d’attaque directe. Cependant la présence de programmes malicieux
est en pleine augmentation, et la démocratisation d’outils automatisés ainsi que l’apparition de
marchés officieux ont favorisé une banalisation des programmes malicieux peu complexes tout en
entraînant l’apparition de programmes conçus sur mesure, d’une complexité généralement bien plus
élevée.
Investigation numérique de programmes malicieux
Connais ton ennemi Le terme programme malicieux, récent en comparaison des terminologies
telles que virus ou ver, n’est pas directement rattaché à des questions de fonctionnalités techniques
: il s’agit ici de l’utilisation possiblement malveillante du-dit programme qui lui donne son nom, et
donc, in fine, de l’usage qu’en fait un utilisateur. Il reste plus aisé d’appliquer la notion de malveillance à un être humain qu’à un programme informatique. Il est cependant possible de remarquer
que les programmes malicieux disposent d’une gamme de composants présents de façon générale.
S’y trouvent les éléments suivants : la charge, ou élément actif du programme malicieux (qui peut
contenir des bombes, des sondes d’écoute, des fonctions de manipulation de l’environnement, des
interfaces graphiques, etc.) ; le moteur d’auto-réplication qui permet d’effectuer des copies du
programme, cela de façon parasite (virus) ou non (vers), se construisant en général autour d’un
système de recherche et de filtrage des cibles ; des mesures de discrétion, que ce soit sous la forme
de modifications de données (e.g. : cryptographie), de déploiement de données embarquées (e.g. :
packing), de modification à la volée lors de l’auto-réplication (e.g. : polymorphisme) ; des routines
de défense, que ce soit sous la forme de détection de machines virtuels et de décompilateurs ou d’une
défense plus pro-active qui désactivera l’anti-virus local ; un moteur de communications réseaux,
qui permettra la propagation via le réseau, l’exfiltration de données ou la réception d’ordres.
Un vocabulaire familier existe cependant, désignant les programmes malicieux selon leurs comportements et impacts sur le système local, puisant en général dans un vocabulaire quelque peu
imagé. Ces désignations peuvent se répartir en trois grandes catégories. Parmi les systèmes
d’acheminement se trouvent les virus (reproduction parasite, orientée mémoire), les vers (reproduction sans parasitisme, orientée réseau) ainsi que les chevaux de Troie (pas d’auto-réplication).
Une porte dérobée (permet d’esquiver le système d’authentification et d’accès local), un outil de
dissimulation d’activité (se place hors de portée de l’utilisateur et du système local, en général en
amont) ainsi qu’un enregistreur de frappes (le concept pouvant s’étendre à d’autres entrées qu’un
clavier) forment les charges les plus connues du public. Concernant les applications, qui sans équivoque donnent leur nom aux programmes malicieux, s’y retrouvent les logiciels publicitaires, les
logiciels rançons ainsi que les robots (ou ordinateurs zombies, faisant souvent partis d’une structure
à part entière, un réseau d’ordinateurs zombies). Un esprit vif remarquera que les fonctions techniques nécessaires pour produire ces programmes malicieux servent aussi des buts plus légalement
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
3
louables (accès distant à un serveur, sécurité des communications, partage de données, propagation
d’une mise-a-jour, etc.).
Connais tes options La connaissance du public provient certes de la médiatisation récente de
la criminalité informatique mais aussi de l’existence de logiciels de protection et de détection des
programmes malveillants. Les anti-virus, ainsi que différents anti-programmes et pare-feux, servent
dans les faits de vecteur de sensibilisation du public. Cependant, bien que les outils destinés aux antivirus (et assimilés) et à l’investigation légale soient similaires dans la technique, les buts ainsi que
certaines méthodes diffèrent grandement. Cela s’explique par la position dans l’apport en sécurité
fourni à autrui : les programmes de protection servent à détecter et contrecarrer des programmes
malveillants, cherchant donc à distinguer un danger d’un élément banal (principalement à l’aide
de signatures, que celles-ci soient statiques ou heuristiques), tandis que l’analyse légale cherche
à comprendre les différents éléments et à en extraire de l’information (principalement à l’aide
d’éléments d’observation et d’analyse de l’information).
Connais ton histoire L’existence de programmes malicieux n’a rien de récent, bien que le
crime informatique soit dans les faits plus anciens. Il est cependant notable que les concepts
d’auto-réplication, de virus ainsi que de vers (et les terminologies respectives) datent des débuts de
l’informatique, bien avant l’emploi de l’adjectif malicieux pour désigner du code. L’auto-réplication
précède même les ordinateurs dans leur définition moderne, en se trouvant au coeur de certains
travaux de John von Neumann (qui amèneront à formaliser la science des automates cellulaires,
ainsi qu’au jeu de la vie de Conway). Fred Cohen, en s’appuyant sur ses travaux d’enthousiaste
puis d’universitaire, formalisera l’existence des virus. Celui-ci approchant le sujet avec un esprit
ouvert considèrera certes une utilisation nuisible de ce type de programme, mais aussi de possibles
apports positifs (des exemples illustrant chaque cas apparaitront dans les années suivantes, bien
que, in fine, plus sous la forme de vers que de virus). Cependant, depuis la démocratisation de
l’informatique et son inclusion dans les activités criminelles, l’existence de programmes malveillants
est indéniable et les cas comme les virus y sont rattachés. Si la forte pression amenée par la lutte
anti-virale servit de force évolutionnaire qui amena l’existence de cas particulièrement complexes, la
banalisation de certains outils poussa bien plus loin la quantité de cas peu complexes, attribuables
à un grand nombre d’acteurs différents.
MALANET : un environnement d’analyse, outils compris
Assister l’humain avec un système automatisé Les apports des technologies modernes de
l’information à la société ont certes profité aux éléments criminels mais aussi aux acteurs légaux. Si
des outils de base existent, tels que ceux issus de l’analyse légale, ceux-ci peuvent être surpassés par
des systèmes plus complexes, construits en utilisant ces outils comme fondations. L’agrégation de
différentes sources d’information et l’analyse comparée du contenu permet une compréhension plus
fine des données ainsi qu’un gain de temps sur les tâches pouvant être automatisées, et donc une
exploitation potentiellement plus efficace. Afin de répondre aux besoins d’une analyse d’un logiciel
malveillant, et fournir une information construite et probante, le système peut bâtir son analyse sur
l’utilisation de simulations. Une simulation contrôlée fournit un accès inégalé aux comportements
et interactions du logiciel malveillant, dans les limites de la solidité de la simulation.
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
4
Composants d’un système d’analyse Un système d’analyse se construit autour de différents
composants qui se doivent de coexister et de coopérer. Le coeur technique est composé des cibles
contrôlées pour la simulation, d’outils d’observation (e.g. : sondes) et d’analyse rapide (e.g. :
filtres) de l’information, les seconds interprétant l’information récupérée par les premiers dans
les cibles. Un système de rapports, récupérant et projetant l’information, ainsi qu’une interface
utilisateur permettent de garantir des interactions avec l’utilisateur et l’accès de ce dernier aux
résultats de l’analyse par simulation. Afin de couvrir les besoins réguliers de différents utilisateurs,
un système propose des services basiques : un système d’archivage, des mécanismes internes, des
outils d’administration ainsi qu’une documentation. En profitant de l’existence d’un système audelà de la plateforme de simulation, la construction d’algorithmes extérieurs permet d’amener de
l’intelligence dans le système en lui-même, exploitant plus largement son potentiel. En combinant
ce type de capacités avec l’existence d’une mémoire, au travers du système d’archivage, un système
de corrélation peut être construit autour d’un moteur de création de profils ainsi qu’un moteur
d’identification de profils. Finalement le niveau de qualité et d’exhaustivité requis par le processus
judiciaire, propre aux forces de l’ordre et à l’analyse légale, peut être atteint au travers de la
variété. En proposant un véritable florilège d’outils d’observation et d’analyse ainsi qu’un ensemble
de cibles aux conditions variées (réelles ou virtuelles, différents systèmes de virtualisation, différents
environnements logiciels, etc.), la variété permet d’accéder à une analyse de plus haute qualité.
La solution MALANET En adoptant la solution d’un système d’analyse automatisé reposant
sur des simulations comme réponse aux problématiques récentes de la criminalité informatique
dans le cas des programmes malicieux, l’unité KI-42 construit progressivement MALANET, une
expérience développée en parallèle à leurs attributions quotidiennes. Cependant, notant que la
qualité de la simulation repose certes dans la crédibilité de l’implémentation technique, mais aussi
dans l’étendue couverte par la simulation (la seule simulation possiblement parfaite de l’univers est
une simulation de l’univers au complet, donc l’univers lui-même), KI-42 a cherché à repousser les
limites de la simulation. Afin de couvrir plus de situations et ainsi débloquer plus d’informations
utiles, MALANET est construit autour d’une simulation de cibles (réelles ou virtuelles) certes mais
aussi d’une simulation d’une connexion à l’Internet. Cela permet de leurrer le logiciel malveillant
et de lui faire entamer ses échanges réseaux habituels, obtenant ainsi d’autres possibles éléments
du système malicieux.
Environnement d’analyse : déploiement de cibles virtuelles
Passer au virtuel Une machine virtuelle, et plus spécialement les dites bac-a-sable, sont des
systèmes contrôlés au sein d’un environnement virtualisé, isolé du système sous-jacent et capable de
revenir à un état stable par défaut. Opter pour des machines virtuelles permet de diminuer le coût,
autant en moyen qu’en temps, pour la mise en place (captures, clones) comme pour l’opération
(utilisation du superviseur). Cela permet d’assurer des analyses construites sur des simulations
rapides et à bas coût, bien que cela soit forcément moins performant que l’utilisation de cibles réelles
(une machine virtuelle peut être détectée, grâce aux différences de comportement par rapport à un
cas réel, comme via un point non-documenté d’un système). Cependant, tout comme une simulation
possède des limites, une routine de détection en possède également, et les programmes malicieux
peu complexes peuvent être aisément trompés. Les machines virtuelles fournissent donc une réponse
pratique à la problématique de la démocratisation de l’accès à des programmes malveillants.
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
5
Afin de choisir les solutions implémentées, les cas existants sur le marche ont été considérés selon
plusieurs critères. Des critères forts, dits d’exclusion, ont permis de dresser une liste réduite en
s’appuyant sur le coût, la possibilité d’un déploiement local, l’automatisation du système. Afin
de trier plus finement les résultats, des critères faibles, dits de sélection, ont été établis : l’accès
aux sources, la capacité d’automatisation et d’intégration dans MALANET, la crédibilité de la
virtualisation sous-jacente, l’état du développement et la stabilité du produit, le contenu et les
formats des rapports, etc. La liste réduite est par conséquent la suivant : Buster Sandbox Analyzer
(rejetée car récemment arrêtée définitivement, mais recommandée autrefois) ; Zerowine (construite
autour de Wine et sans mise-à-jour récente, ce qui demande des renforts supplémentaires et un
possible retravaille de la machine virtuelle en elle-même, néanmoins l’idée d’exploiter Wine est à
conserver) ; Minibis (version locale d’anubis produite par le cert.at, prometteuse mais longtemps en
stand-by, une nouvelle version devrait être disponible sous peu constituant une option intéressante
pour le futur) ; Cuckoo (projet libre porté par une fondation, s’adapte à différentes technologies de
virtualisation, rapports disponibles dans différents formats, apprécié de la communauté et disposant
déjà d’extensions modulaires).
Implémenter Cuckoo Afin de fournir rapidement un nouveau type de cibles dans MALANET
et un outil d’analyse basique et rapide, le travail s’est concentre sur l’implémentation de Cuckoo.
Cuckoo est un système construit autour d’un ensemble de scripts en Python qui s’appuie sur une
DLL d’observation (cuckoomon) transmise avec le programme malveillant à la machine virtuelle,
sur une application en navigateur comme interface utilisateur (construite sur Django), ainsi que sur
une base de données en arrière-plan pour permettre un archivage facilité. Le système sous-jacent
fonctionne sur une base Debian, avec une version récente de Python, une base de données Mongo,
un micro-serveur web pour Django à l’aide de nginx et uwsgi. Concernant la virtualisation, deux
options ont été retenues : VirtualBox et QEMU-KVM. Les deux disposent de qualités et de défauts
: VirtualBox est moins crédible dans sa simulation, légèrement moins rapide dans son exécution, le
code est un mélange de libre et de propriétaire, mais QEMU-KVM est plus complexe à déployer et
à manipuler, moins aisément automatiquement contrôlé, plus instable dans son intégration avec les
plug-ins. La possibilité d’opter pour les deux ne devrait pas être négligée, QEMU étant privilégié
pour sa plus grande crédibilité en tant que simulation et l’utilisation de la virtualisation soutenue
par le matériel. Un réseau virtuel relie les machines virtuelles et le script principal. Différents
systèmes Windows (principalement XP, Vista et 7, en 32 et 64 bits), avec différents environnements
logiciels possibles, servent de cibles.
Bien que l’outil fourni soit fonctionnel, différents renforcements peuvent être mis en place. Parmi
les extensions existantes, plusieurs sont à prendre en considération : les signatures fournies par la
communauté (anecdotiques, ajoutant cependant de la valeur à l’analyse automatisée) ; interactions
automatisées avec Volatility, un outil d’analyse de mémoires (instable avec QEMU-KVM pour des
raisons de formats de capture de la mémoire) ; extension de Tomer Teller permettant des captures
de la mémoire selon des événements précis au-lieu d’une simple capture en fin d’exécution ; malwasm de malware.lu qui fournit des capacités de rétro-ingénierie ; zer0m0n la DLL d’observation au
niveau kernel. Pour l’instant, la nécessité d’un outil fonctionnel a repoussé l’intégration d’extensions
en raison de l’instabilité et de l’incompatibilité (en général avec QEMU-KVM). Une version retravaillée de la DLL d’observation cuckoomon : la DLL se plaçant au milieu pour écouter les demandes
du programme malveillant, l’interception et la modification à la volée des informations transmises
sont possibles, et donc la manipulation de la perception de l’environnement du programme analysé,
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
6
un prototype permettant de contrecarrer les cas de faible complexité a été mis au point et déployé.
L’utilisation de Cuckoo sur des programmes issus de sites de téléchargement a amené à se concentrer sur les communications des réseaux d’ordinateurs zombies. En permettant une connexion
à l’Internet, possible sur des cas obtenus dans la nature, cela fournit une occasion de capturer
et d’observer les protocoles et méthodes de communication employés par les ordinateurs zombies,
constatant la présence d’encodage de l’information et de sécurité des communications (chiffrement,
tests d’intégrité, etc.). Une fois le protocole compris, il est possible de décoder automatiquement
les communications (la clef de chiffrement pouvant être extraite d’autres éléments de l’analyse ou
testée à l’aide de clefs par défaut), ce qui soulève aussi l’utilisation des schémas de protocoles et
le décodage automatisé pour identifier les types de programme malveillant, permettant l’inclusion
dynamique de faux serveurs de contrôle dans la simulation de connexion à l’Internet. Bien que
des cas uniques et précis existent, à de nombreuses reprises il s’agit d’un ensemble de programmes
(issu d’un réseau contamine, d’un serveur de production, etc.). Cela a amené le besoin de créer un
ensemble de scripts pour manipuler des groupes de programmes malicieux, ce qui ouvre la possibilité d’utiliser d’autres rapports comme source d’information secondaire pour une analyse en cours,
permettant de bâtir des systèmes de corrélation (évitant des analyses inutiles dans le cas de programmes identiques au sein d’un groupe, mettant en évidence les points communs possibles entre
différents programmes, etc.).
Cibles virtuelles à l’intérieur de MALANET Bien qu’étant un outil fonctionnel en lui-même,
Cuckoo a été déployé pour être intégré dans MALANET. Cela s’accomplit autant par l’API pour
injecter des programmes à analyser (avec les options et arguments correspondants) et extraire les
rapports, que par l’intégration des-dits rapports comme source d’information supplémentaire dans
l’analyse globale. Cependant l’existence d’un environnement contenant Cuckoo permet de puiser
dans le potentiel du système, et non de chaque outil pris individuellement, pour construire des
fonctions de gestion d’options inhabituelles (passage de la DLL classique à la version renforcée,
changement de système de virtualisation), l’intégration correcte des extensions maisons (gestion
de groupes de programmes, décodage des communications d’ordinateurs zombies), l’utilisation des
archives comme source d’information secondaire et donc comme mémoire (pour un système de
corrélation, construit sur un moteur de profilage et un moteur de comparaison, permettant une
analyse automatique plus poussée).
Cependant, il ne faut pas se laisser induire en erreur par la facilite d’usage et la rapidité
d’exécution de Cuckoo, un utilisateur doit toujours rester critique devant des résultats. Autant
pour des questions de qualité des outils d’observation et d’analyse que pour les faiblesses inhérentes
aux machines virtuelles. Afin de fournir un outil de qualité supérieure, requis pour des enquêtes
et des actions légales, il faut être capable de trouver les failles dans les machines virtuelles tout en
essayant de pousser les limites du système.
Introspection du système : fabriquer une usine à échantillons d’essai
Analyser un système d’analyse Afin d’analyser un système d’analyse, il est non seulement
nécessaire de disposer d’une source d’information contrôlée mais aussi d’une approche introspective
permettant d’exploiter le système de l’intérieur. Ce qui amène à fabriquer des programmes malveillants de test. Bien que similaires en de nombreux points à des programmes malveillants classiques,
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
7
ils ne disposent pas de charges et sont tournés vers la conscience de l’environnement (pour la détection de machines virtuelles) et des routines défensives (pour augmenter la discrétion). Afin de
fournir un outil manipulable par les différents acteurs, tout en couvrant les différentes situations
et en s’adaptant aux évolutions du système, il est alors question de la mise au point d’une usine à
échantillons plutôt que d’échantillons uniquement.
Fabrique une usine à échantillons La première étape est la phase de conception, comprendre
le besoin pour construire une solution adaptée. Les difficultés de création d’échantillons résident
dans le besoin de familiarité avec un langage de programmation, une connaissance plus profonde des
systèmes d’opération ciblés, ainsi qu’une approche intellectuelle parfois inhabituelle. A cela s’ajoute
l’intégration automatique de routines fréquemment utilisées pour diminuer le temps investi dans la
conception. La fabrique se construit donc autour d’une bibliothèque de fonctions pré-écrites (les
.pieces), d’un langage de description et de programmation plus aisément manipulable qu’un langage
courant (les .schematics), de scripts permettant d’assembler ces différentes informations pour
fournir un exécutable (l’architecte), et d’une documentation plutôt exhaustive pour faciliter
l’utilisation de ce nouvel outil.
Le pseudo-langage, comme tout langage, nécessite un ensemble de règles afin de rendre l’information
compréhensible tout en fournissant différentes fonctions et subtilités. Chaque précision, chaque nouvelle fonction demande l’attribution de marqueurs et donc une diminution de la marge de manoeuvre
dans l’utilisation du langage. Il s’agit de trouver un équilibre entre fonctions et marqueurs, ainsi
que des astuces pour diminuer le nombre de marqueurs réservés. Le détail du cheminement logique
est fourni dans le corps du texte en anglais, celui-ci ayant amené à : placer une instruction, avec
arguments d’entrée et de sortie, par ligne de code ; disposer de marqueurs pour distinguer les valeurs
neutres, les chaînes de caractères et les instructions en tant que types d’arguments ; construire un
système de listes de valeurs comme arguments pour faciliter la répétition d’instructions similaires
; l’import et l’injection de routines écrites dans d’autres fichiers ; l’existence d’alias remplaçant un
ensemble d’instructions et acceptant des arguments d’entrée et de sortie ; un marqueur permettant
l’ajout de commentaires dans les schémas ; ainsi que quelques mécanismes internes.
L’implémentation technique passe par plusieurs points. Les pièces, en C, sont manipulées par
l’Architecte, en Python, à l’aide des schémas, en pseudo-langage. Les éléments sont repartis
dans une architecture de dossiers (pieces, schemas, resultats), complétés par la mise au point de
plusieurs utilitaires (disponibles : générateur de documentation, xOR de valeurs hexadécimales ;
en développement : injection automatique de code poubelle, remplacement automatique des noms
de variables par des valeurs aléatoires), ainsi que la mise au point des différentes pièces (catégories
disponibles : fichier, registre, mutex, processus, variables systèmes, manipulation de données, communications réseaux, interface console, conditions, boucle, direct ; en développement : aléatoire,
poubelle, interface graphique), la mise au point des différents alias et routines (disponibles : alias
pour les communication via HTTP, routines pour l’auto-délétion du programme malicieux ; en
développement : routines pour l’auto-réplication et le déploiement de programmes embarqués),
et finalement, la mise au point de différents schémas (imitations de pafish et des communications
d’Andromeda, Anderson et Neo pour les tests basiques et l’exploration des limites du système
respectivement, theKid pour l’exploration des limites de la fabrique, Smith pour l’étude de la propagation parasite).
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
8
Notons l’utilisation d’un développement principalement découpé en deux phases. Bien que
l’exploration des limites du système permette d’enclencher un profond travail d’amélioration du
système, le besoin plus urgent d’un élément de test pour valider les fonctions de base prend le pas.
Un schéma emblématique est construit par étape, servant de but temporaire. Pour la validation
basique, un élément sans conscience de son environnement, Anderson ; pour la recherche des limites
et prendre le dessus sur le système, un élément conscient de son environnement, Neo. Cela permet
non seulement une gestion des besoins inscrite dans le rythme du code, mais aussi un développement
progressif des fonctions du pseudo-langage et de l’usine.
Faire tourner le code Bien que la demande puisse paraitre étrange, l’Architecte pourrait être
placé en libre sur l’Internet via Github. La considération des gains et des défauts passe principalement par d’un cote l’augmentation des possibilités de futur développement et d’intégration
dans d’autres outils, d’un autre cote le danger de placer en libre du code relié à des programmes
malicieux. Cependant l’Architecte est une usine et non un programme malicieux directement, qui
ne dispose pas de pièces de code correspondant à des charges malicieuses. Le niveau requis pour
implémenter ses propres charges malicieuses est suffisant pour fabriquer de toutes pièces un programme malveillant de basse-complexité par soi-même : propager l’Architecte provoque donc une
augmentation très faible du potentiel de criminalité informatique, tout en fournissant un outil aux
communautés de recherche scientifique, aux acteurs de la sécurité, aux enthousiastes. Concernant
des issues plus pragmatiques, aucune information sensible concernant des projets internes liés à
MALANET ou KI-42 ne se trouve dans le contenu de l’Architecte, tandis que la question du financement (au travers de la paie du développeur) soulève la question de savoir si l’argent publique
investi ne devrait profiter qu’à la BKA ou à un plus grand nombre de personnes. On retiendra
cependant l’argument de l’apport aux confrères travaillant dans le domaine de la sécurité, mais
aussi dans les différents domaines scientifiques connexes comme les automates cellulaires, la propagation virale, les programmes conscients de leur environnement, et l’éclairage porté sur la différence
entre un virus et un programme malveillant.
Loi et (des)Ordre technologique
Crime informatique et police informatique, le meilleur des mondes Les avancées technologiques apportent à la société humaine en permettant de repousser les limites de l’évolution
biologique. Bien que le coût en soit parfois trop élevé, la problématique plus commune est celle
des usages néfastes d’une technologie (e.g. : le cas des technologies à double-emploi, civil et
militaire). Lors de la démocratisation d’une technologie dans la société, celle-ci devient certes
accessible aux forces de l’ordre mais aussi aux éléments criminels. Bien que les technologies modernes de l’information aient vu les premières utilisations malicieuses dans les mains d’enthousiastes
cherchant à explorer un nouveau territoire et repousser des limites, elles sont désormais employées
par des personnes criminelles en premier lieu (autant pour la communication que l’action directe).
C’est en réaction que les forces de l’ordre ont elles aussi adopté une approche plus ouverte vis-avis des technologies de l’information et de la communication, non seulement pour se placer à un
niveau similaire à celui des criminels, mais pour renforcer leurs propres capacités via le potentiel
de systèmes automatisés.
Ex machina : l’assistant dans la machine Tout comme l’automate mécanique s’est mis à
remplacer l’être humain dans les activités industrielles, l’automate numérique a commencé à sup-
CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT
9
porter l’être humain dans différents secteurs. Si les premiers systèmes d’analyse furent relativement
statiques et primitifs, ils évoluèrent rapidement pour s’adapter à l’existence d’éléments plus dynamiques. C’est dans ce cadre qu’existe l’analyse via simulation, qui s’avère être une excellente
réponse à l’extension du crime informatique via l’utilisation massive de programmes malveillants
type virus et chevaux de Troie. Ainsi, le projet MALANET s’inscrit dans la lutte actuelle menée sur
deux fronts, contre les nombreux cas de basse-complexité et contre les situations plus délicates. La
force du projet provient non seulement des outils mais aussi de l’exploitation du potentiel synergique
du système global, ainsi que de l’extension du domaine couvert par la simulation à une connexion Internet simulée. Cependant, entre les défauts et manques de certains outils d’observation et
d’analyse et les faiblesses intrinsèques des simulations, deux conclusions s’imposent : il faut rester
vigilant et critique face aux résultats, et pour pro-activement construire un système de meilleure
qualité, il faut pouvoir analyser le système et en repousser les limites de l’intérieur.
Virus, par-delà le bien et le mal Afin de renforcer un système d’analyse face aux capacités
de détection d’un environnement simulé de certains programmes malicieux, l’utilisation de programmes malicieux non-offensifs s’inscrit dans une continuité logique. En fournissant un accès à un
ensemble de programmes malléables, une analyse introspective précise du système peut être mise en
place. Cependant, si la légitimité de programmes malicieux peut être soulevée ici, elle s’inscrit en
opposition aux usages illégitimes. S’ouvrir à des considérations sur les usages possibles de ce type
de code permet de franchir la barrière de l’appellation malveillante, qui s’inscrit plus dans les choix
des acteurs humains que la valeur intrinsèque du code. Il est ainsi possible de replacer les travaux
effectués dans le cadre de virus ou autres programmes ambigus dans un contexte scientifique et
ingénierial plus large. Sont notables les différents domaines directement connexes (automates cellulaires, étude du parasitisme, etc.), mais aussi les avancées plus innovantes, que ce soit en recherche
(concernant la sapiens artificielle, l’évolution darwienne et la conscience de l’environnement) ou
en développement (concernant l’existence de code adaptatif ou l’auto-réplication au service de la
propagation et synchronisation de données).
Evolution de la technologie et évolution de la société Bien que les considérations originelles
de ce document portaient sur des questions à priori applicatives et/ou scientifiques, celles-ci ont
développe une sensibilité sociale et humaine au contact des questions de l’aspect malicieux d’un programme et, sur un plan plus large, de l’intégration de technologies dans la société. L’environnement
des forces de l’ordre et de la criminalité informatique favorise bien entendu la présence de cette
réflexion, de par leurs présences et impacts respectifs sur la société. En manipulant des éléments
technologiques similaires dans deux cadres opposés, le point de la conséquence de la mise au ban de
ces technologies pour des raisons de sûreté (et par l’utilisation du vocabulaire de la malveillance, par
défaut, pour des ensembles d’algorithmes) a été soulevé. Le coût, conditionné par la perception du
sujet par le public, semble être un manque à gagner d’un point de vue innovation et la diminution
de l’existence de certains domaines de R&D, délaissés car mal considérés ou simplement inconnus.
Pour y répondre, il ne s’agit alors non pas de considérer la légalité de certaines technologies, question qui entraine la dégradation du potentiel scientifique et technologique ainsi que la propagation
de la peur et de l’ignorance, mais les problématiques plus fondamentales qui gênent l’intégration
de ces technologies dans la société : autant ces cas spécifiques soulèvent des questions liées à notre
niveau de contrôle de programmes informatiques et des peurs liées aux automates, autant de façon
globale il s’agit plus de confronter la société dans sa composante humaine, plus difficile et plus
complexe que de nuire à l’innovation technologique via le tabou.
Chapter 2
The Bundeskriminalamt and the rise of
computer crimes
"Another one got caught today, it’s all over the papers." - The Mentor
2.1
BKA: a federal agency
As a country, Germany is a federation of länders, born from
the history of the German regions. Though each state possess a
criminal police office (the LKA, Landeskriminalamt), a federal
criminal police office exists, the BKA. The Bundeskriminalamt
does serve as a relay and focus point between the various LKA
but also as an active office in itself, managing cases spanning
on multiple states or at a national level. Due to repartition of
budget and manpower, the BKA usually handles more complex
cases and bigger threats. A close comparison would be the existence of the FBI inside the US law
and order system. Due to their capacities and missions, their areas of focus include forensics
for special events, criminal intelligence, witness protection, counter-terrorism, and also computer
crimes.
In order to act as a proper relay between the various LKA and other law enforcement agencies,
while also acting on an international level (cooperating with Europol, Interpol, etc.), the BKA
has three major offices in Germany. Though for historical and practical reasons during Cold War
headquarters are located in Wiesbaden, the BKA is also installed in Berlin and near Bonn. The
various departments and units of the BKA are spread over the three places, with most of the units
having personal on each site (although each site may have a different focus).
2.2
From handling communications to computer crimes
The history of handling IT-related crimes finds its roots in the Cold War, when a part of the
BKA was focused on managing foreign agents, mainly from East Germany, on West Germany soil.
The nascent telecommunication systems, toughened by the world wars, were massively employed
10
CHAPTER 2. THE BUNDESKRIMINALAMT AND THE RISE OF COMPUTER CRIMES 11
by agents in the field and by local intelligence services. With the evolution of technology and the
democratisation of computers, not only did the tools of foreign agents changed but also those of
criminals. If the first people concerned by computer crime were computer enthusiasts exploring
a new realm, the focus rapidly shifted to regular criminality starting to integrate computers and
telecommunications in their designs. This phenomena leads to the actual state of the trendy cybersecurity area, where computer enthusiasts, regular criminals and state agencies found themselves
with similar tools in the same playground.
Nevertheless, if the world changed and followed technological evolution, both the criminal and
the law and order elements of society adapted. To provide up-to-date investigation, police agencies,
including the BKA, have to follow two parallel paths in the integration of IT in police work. The
first point of focus is to adapt investigation in order to be able to answer properly to computer
crimes or casual integration of IT in criminal works, which is mainly about furnishing investigators
with technical knowledge and tailored tools. The second point of focus is not found in the direct
investigation part but in the support of those investigation units. The complexity and the diversity
of technology ask for specialised units providing help and expertise to investigation units, in the
same way experts are available for ballistic, chemical or psychological questions.
This support splits itself in two different kinds. On one hand, active and direct technical support
can be found, to provide more complex tools and expert use to investigators. As an example, it
includes the wiretapping operations. On the other hand, a more analytical and intelligent support
can be provided, as a way of providing computer/IT forensics to the investigators. Where the first
one is more about having a direct action in the investigation to provide more information (in order
to collect fresh information, in order to keep an eye on a criminal element, etc.), the second one is
more about reconstructing events and finding proofs in a collection of information.
2.3
KI-42: network forensics, wiretap analysis, malwares
In the case of the BKA, those various areas of action and expertise (computer crime investigation,
wiretapping, computer forensics, etc.) are divided into various units. The KI-42, part of the
forensics institute (one of the nine departments), is one of them, mainly oriented towards network
forensics and data analysis. Regarding the precise unit in Berlin, they also extended their work to
malware forensics due to their history (the split of an older computer forensics unit gave birth to
this unit and the more hardware-oriented/system-oriented forensics unit KI-22, afterwards it was
completed with some wiretapping and also got back the malware side as a supplementary mission).
Though malware was not their original focus, the integration of this area in computer crime
forensics is a wise move. If the democratisation of computers and telecommunications changed
regular crime by adding cellphones, smartphones, mails and so forth, in the equation, the criminal
use of technology goes further. If there was some ambiguities in the past years, nowadays more and
more people are sensitive to this topic: who hasn’t heard of some penetration inside government
databases or some corruption of company infrastructures. If the regular, witty, hacking has grown
into a range of penetration and exploiting tools, an other kind of technological approach also exists:
malwares. And this trend is also growing.
Chapter 3
Malware forensics
"If you know others and know yourself,
you will not be imperilled in a hundred battles." - Sun-Tzu
Malware, standing for malicious software, covers a variety of applications. It is used indifferently
to describe various IT components, from a single program to a combination of executables or an
entire system. The malicious designation is linked to the potential of those elements due to their
inherent functionalities and capacities. In fine, it will be the behaviour and the use of those elements
that will categorized them as malware.
In order to clear the air, approaching malware as an object of scientific study and classification
is a first step. Providing the basic possible components and a practical taxonomy, detailing the
commonly used vocabulary, will provide the necessary knowledge. Although this is needed, more
and more users are aware of the existence of somewhat dangerous software. Mainly through the
market of anti-virus and security software, or an unfortunate mail attachment opening, which raises
the question of the differences between the way of an anti-virus analysis and the way of a forensicsoriented analysis of malwares. A more general view of the subject can merge the various details in
an logical story that will provide a background for today considerations about malicious software.
3.1
Know your enemy
Although malwares are various in nature and grouped together because of their potential and
behaviours, these criteria still lies on top of an ensemble of common functionalities. Though each
functionality may not be present in each malware sample, they are painting a picture of what can be
a malware and what it can achieve or be used for. Those capacities are usually combined to produce
specific kinds of malware, that will usually get a denomination from the security community and/or
the general public. A basic taxonomy of those various common kinds of malwares will complete the
theoretical approach of the common functionalities.
12
CHAPTER 3. MALWARE FORENSICS
13
Malware main components
The main component of a malware, that justifies the name, is the payload. The payload is
the main unwanted functionality of the malware, its goal on the local system. A malware acts as
a wrapper for the payload, which can be malicious and can be offensive, both to some extent. A
payload may be a so-called bomb (logic bombs that will crash the local system, fork bombs that will
full a local memory, etc.), hooks into specific processes or drivers (e.g.: a keylogger will probably
hook at some point in the keyboard processing), modification of local data (to modify security
measures for example), a graphical interface (to display unwanted messages for example), and so
forth.
Other components may or may not actually exist in a particular sample, but can be expected.
A self-replication engine will be present in viruses and similar malwares, in order to propagate
copies of the malicious piece of code. The self-replication may be parasitic, by adding the code
to other executables, or not, by copying a stand-alone file (that can be kickstarted in different
ways: alternate data stream, debugger options, deceptive appearance, etc.). Functionalities that
can be commonly found in a self-replication engine include a crawler algorithm (whether it is to go
through a file system or a local network) and a targeting system (which may filter potential targets
according to local system variables, previous infection, etc.).
Various stealth measures can be implemented, to optimise the discretion of the malware,
augmenting its chance of survival. Stealth, or enhanced discretion, can takes several forms: it can
be through data manipulation (like encryption), through managing embedding data (like packing
and unpacking), through injecting code into live processes, through manipulating its code before
self-replication (like polymorphism or metamorphism). Similarly, to increase its chance of survival
in different ways, more defensive routines may be implemented. Protection can be achieved
through environment-awareness to detect the use of an debugger (and in consequence limit reverseengineering of the sample) or the use of a virtual machine (and in consequence limit behavioural
analysis). It can also be achieved through more pro-active or offensive defence (through various
timer system to break the flow of analysis and outlast an anti-virus, through a fake payload as an
answer to a positive detection, through ways of damaging or shutting down the anti-virus or the
RE software).
Finally, though it may be rarer for some families of malware (like stand-alone samples or melting
samples), a network communication engine may be found. The ability to interact with a local
network connection, with or without access to the Internet, can provide support for the payload or
for the malware itself. The network can be used in order to propagate outside of the local station,
to exfiltrate data (whether stolen from files or monitored with hooks), to provide the attacker with
a remote access (whether it is some sort of distant shell or just an algorithm answering orders), and
also to download updates for the local malware or new pieces of malicious code.
CHAPTER 3. MALWARE FORENSICS
14
Malware basic taxonomy
As stated before, the term malware does cover elements of various nature, from a particular
functionality to a complex system. This description is due to the use made of the code, and the
commonly expected underlying functions linked to those behaviours provide a better understanding
of what can be called a malware. Further than this only term, a whole vocabulary, usually based
on a particular characteristic of the sample (sometimes in a poetic way), has emerged in the past
years.
In order to further clarify the situation, a small taxonomy built on this vocabulary will provide
a more structured understanding of the subject. Note that, in the same way the exact definition
of malware is delicate, no precise and fixed generally-accepted definition of the vocabulary exists,
although a commonly agreed upon definition can be crafted from the various understandings.
Delivery system
• A virus is a self-reproducing software. It may use different vectors of propagation, infect
different kind of elements. It usually will hook itself into an other component or program to
be kickstarted.
• A worm is similar to a virus, being also a self-reproducing software. The main differences are
in the propagation behaviour (a worm is more focused on network spreading, hence the name,
where a virus is usually more focused on memory drives) and the parasitic aspect (while a
virus is usually a parasitic program, a worm has more chance to be a stand-alone component).
• A Trojan horse will also act as a way to infect a machine, using exploits similar to those of
viruses and worms, but it will not try to self-reproduce (stand-alone).
Payload
• A backdoor is the bypass of the local authentication and right attribution system. It is
usually linked to a unwanted distant access though it can be an offline backdoor (to bypass
login or escalate privileges) or a mathematical backdoor (to solve the mathematical equations
of a cryptosystem without knowing the key used during the encryption1 ).
• A rootkit is a software or set of software that tries to bypass detection by the user and the
station by placing itself beyond the reach of the local system, usually before the system itself.
While increasing discretion, it usually also provides a deeper control and monitoring of the
local station.
• A keylogger is a piece of code listening to the user inputs, usually hooking at least into the
keyboard2 . Aimed at monitoring or data stealing, the keylogger part is usually the payload of
an other malware (though, it can be directly manually installed). The information collected
can be locally saved and/or pushed over the network.
1 e.g.:
the plausible backdoor in Dual_EC_DRBG.
newer versions of monitoring/data stealing can hook into the mouse, the webcam, and so forth
2 Although
CHAPTER 3. MALWARE FORENSICS
15
Application
• An adware will discreetly take over the local station to inject unwanted advertisement,
through control of regular ads display (e.g.: in Internet browsers) or additional graphical
interfaces (e.g.: pop-ups).
• A ransomware will take over the local station and try to lock out the user or limit its actions,
while offering a way to stop this behaviour. Usually the way out will imply paying for it, hence
the ransom part of the name.
• A bot, short for robot, is a program that will answer commands by reacting with its environment. In computer crime context, a bot will usually be a component of a botnet, or
network of zombie computers. Though botnet can be voluntary (e.g.: to provide computing
power), more and more infected stations turn out to unwillingly provide computing power
(e.g.: to mine Bitcoin), network nodes (e.g.: proxy for an attack) or pawns (e.g.: for your
casual DDoS).
Studying malware can be tricky due to the relativity of the definition and the fast evolution
of malicious code. Nevertheless, in order to produce an analysis of quality, a real understanding
is needed, at least to coordinate the various information monitored and inscribe them inside an
investigation. Indeed the names are based on technical behaviours, which leads to the casual
components behind these behaviours and the whole usual vocabulary and the practical taxonomy.
It still raise the question of an intrinsic definition of a malware instead of a potential definition
confirmed by the user desires. One should feels the complexity of the situation by noting that
the underlying functions are almost all dual-use algorithms (propagation systems can be used for
patching or sharing data, crawling systems are a fundamental component of search engines, hooks
allow for fine tuning of driver behaviours, cryptography is used for secured communication, remote
access for servers, etc.).
3.2
Know your options
For the general public, the world of malwares and viruses is first seen through the eye of detection
and protection. Which, without all the system-embedded security measures, means anti-virus. A
legit question would be to look for the differences between the work of an anti-virus software and the
work of a malware forensics agent. After all, for unaware eyes, they both fight against malwares,
mainly by trying to recognizing them, and both to provide security to users. But because they
pursue different goals, their respective works differ, mainly on three points: the position in the
process of assuring security for the society, which leads to different goals, and in consequences to
analyse differently information extracted from a malware sample.
An anti-virus, and other anti-malware software denominations (anti-rootkit, anti-spyware, etc.),
should provide security to a machine, whether it is an embedded device, a local station or a piece of
a more complex infrastructure. The aim is protection through proactive security (by enforcing
healthy habits like software update, by furnishing a default firewall configuration, etc.) but also
through recognition and neutralisation of a threat (the "regular" part of an anti-virus). The
AV goal is to distinguish between a benign software and a malicious software, and if possible in
front of previously unknown software.
CHAPTER 3. MALWARE FORENSICS
16
The recognition process can use various approaches, from primitive but stable ones to more
complex but not error-proof ones. The oldest way is to find and target a signature, a static
element that will be common to a maximum of variations of a precise malware but fairly rare
outside. A more complex way to avoid unwanted results while maximizing the recognition process
is to build a profile as a signature, with various variables (including multiple static elements),
which would ask for some kind of threat level to be fixed for decision making. Going further
down that road, AV built a capacity to analyse unknown software looking for information that
could be used for profiling: static code, function calls, meta-information, communication, overall
behaviour. Nowadays AVs enhance this capacity by playing on other variables, like the decentralised
distribution of positive recognition to other AVs (which means the existence of a fluid database for
unknown/recent software3 ).
But isn’t malware forensics also about analysing the code, listening to the communication, and
observing the overall behaviour ? It is, indeed, and so most of the tools of AV, computer forensics
and researches are similar in nature. What is interesting is what is the purpose of the information, a point that can be understand by looking at the position of each actor in the workflow.
While an anti-virus tries to protect the user from incoming infection and attacks (attacks
if we extend AV to other kind of anti-malware and security software), and plays mainly on the
capacity to distinguish danger, the forensic agent is looking to understand the situation
and extract information for investigation purposes. The forensic agent does not always enter
after the incident (malwares can be found in a case of infection, but they can also come from a
source in the wild, from a development server, etc.), but his role is to go further than the malware.
To use the malware to fish out, if possible, the author behind it, the criminal using it, the goals of
the infection, the position of the act in a bigger plan.
So the recognition is more here to be sure about the nature of the sample, the static analysis and
the communications are here to provide hints about the people behind the infection, the behaviour
will serve as a proof in a possible legal case. Though the basic information is similar, hence the
similarities in the tool, the information uses differ, because the roles differ.
3.3
Know your story
Nevertheless, one would realise that malwares are nothing new. Though the terminology (malware) is quite recent, even compared to the existence of computers, self-replication and viruses are
older terms and concepts. Self-replication actually predates the existence of computers themselves
(in their modern definition) and was one of the topic of John von Neumann4 , which will lead to the
science of cellular automatons (and, later, to the famous Conway’s game of life).
3 which could lead to an attack based on pushing a lot of false-negative reports to protect a particular piece of
code from a particular AV.
4 von Neumann also brought various mathematical theories, one of the first step in quantum thinking, a standardisation of the game theory field, and indeed various innovation in the computing field.
CHAPTER 3. MALWARE FORENSICS
17
Viruses, as self-replicating algorithms, emerge in the late ’70s - ’80s, along the first worms and
the use of the word worm itself, thanks to cyberpunk literature. The terminology virus is linked to
Fred Cohen’s work on parasitic programs (enthusiast work that will lead to a thesis on the subject).
At that point, though most of the viruses and similar programs were mostly practical pranks (yet
sometimes offensive), the concept of malicious software was not associated to them. Fred Cohen
himself viewed viruses as neutral, with the potential to be good infectious program. The first
worms were done for computer management reasons (e.g.: shutting down unused machines) or IT
experiments (e.g.: checking how wide was the Internet5 ), and since the 80’s a handful of worms
have been crafted to hunt other worms and clean after them (going as far as to apply Windows
patches).
But the situation did change in favour of more intrusive then more offensive uses of these kinds of
technology. Up until today, where regular crimes are managed through IT (inside communication,
planification, logistics, etc.) and some crimes are taking new forms (data stealing, data corruption,
system termination, intrusion, etc.). Though a community of borderline code enthusiasts still
exist, with delinquents and criminals using the same tools and similar techniques, it is harder to
distinguish one from the other. Plus, though it also served society and science, the propagation
and democratisation of technical knowledge and turnkey tools initiated a continued augmentation
in the quantity of low-complexity malwares6 and the apparition of highly-tailored samples7 .
Handling at the same time the complex and rare precisely-designed malwares while managing
the rise of low-complexity cases and casual uses is one of the new challenge of the law and order
agencies in the field of computer crime. Not only in order to adapt to the level of technology used
but also to manage the variety and quantity of cases, a focus on IT provides the now necessary
howto and systems need by investigation and support units.
5 Which
did not turned as well as expected, cf Morris worm.
could speculate that the rise will stabilise sooner than later.
7 One could speculate that the ones made by governmental agencies do not count as a consequence.
6 One
Chapter 4
MALANET: an analysis environment,
tools included
"A process cannot be understood by stopping it.
Understanding must move with the flow of the process,
must join it and flow with it." - Dune
4.1
Assisting the human with an automated system
Information technology brought potential to society, including better telecommunications and
smarter tools, but also dual-use technologies that ended in criminal activities. Nevertheless, the
same way IT was used to update some malicious behaviours, it is also used to update management
of and answer to these behaviours. Basic tools, like monitoring tools for forensics, are superseded by
complex systems bringing various tools together and integrating routine manipulations. From these
various sources of information the systems are processing helpful outputs, providing some automated
intelligence to the users. Some casual examples are information aggregation, information parsing,
information correlation, profile building.
Considering this point, constructing a system around usual forensics tools to answer the augmentation of computer crime cases (and especially the diffusion of low-complexity malwares) seems
like a strategic move. A system is both a collection of tools and an adapted environment, including
all the needed pieces to provide the whole service, in this case analysis. The conception and the
implementation of the system will determine the quality of the system, which can be seen through
its thoroughness, processing speed, adaptability and stability.
18
CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED
4.2
19
Components of an analysis system
An analysis can be provided from different kinds of data: one could think of post-mortem forensics
and its static analysis. In the case of malwares, they are living software and could be kickstarted
in a controlled environment. Building the environment around a simulation allows to extract
information not only from static data but also from live behaviours. This would provide a better
understanding of the piece of code, with a variety of insights (direct information sources going from
static strings to network communications, without considering post-treatment and the intelligence
in the system itself).
The technical core
So, if an analysis system is built in direction of deploying forensics tools, it starts with the working
environment. An analysis based on simulation asks for managed targets as a major component
of the system. The targets allow a simulation process, the next step is the capacity to keep an eye
on the simulation and extract information. Two complementary kinds of tools become the next
component of the system: monitoring tools and analytical tools. The monitoring ones will
provide access to the information, from grabbing and parsing the code to hooking into live processes,
and the analytical tools will provide some intelligence in the management of the information, from
selecting particular information in the pool of data to building timelines and profiles.
Interactions with the user
Once the technical layer to run, monitor and analyse samples is done, the next steps are the system
itself and its interaction with the user. Though the information has been collected and parsed, a
component is needed to format and distribute the information to the user, the reporting system.
Producing any kind of documents or structured data allows the human user to enjoy the work
of the system. Further than that, the user needs to access these reports but also to manage and
manipulate the system, hence the existence of a user interface to complete the inside mechanisms.
User services
After the technical core and the interactions with the user, the next components of the system
provide services to use and manage the system on the long-run. An archiving system would
provide a repository for information, that would be used by the user (to safely stock a report for
future retrieval) and the system (as an auxiliary source of information, a memory so to speak).
Inside mechanisms and administration tools are services provided to a special kind of users,
those taking care of the system (it can be ways to restore corrupted targets, clean the archive,
safely restart the system, etc.). Finally a documentation, though it is not a modern technology
it is still IT (pen-and-paper IT), is needed to provide the different users with understanding of the
system and basic guidelines.
CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED
20
System intelligence
Nevertheless, designing a system based on those components provide a working solution without
exploiting the full potential of having automated processes, information pre- and post-treatment
and some kind of memory (the archives). By crafting supplementary and complementary processes,
some kind of intelligence can be found not only in the forensics tools but in the system itself.
This idea is the next step in the crafting process, trying to use the potential of information
manipulation automation while playing on the synergy between the different components and the
environment. The main stone is the existence of the archives, providing older analyses as references
for the actual analysis. This allows for the integration of a correlation system, which would first
need a profiling engine (profiles being extracted from the analysis reports) and then a correlation
engine (that would wisely parse and select information inside the profiles/reports and follows
various rules of comparison, mainly equality and similarity). Indeed, others post-treatments could
be implemented but the ability to see resemblances and to find links between various elements is
one of the cornerstone of the investigation process (with the understanding of each elements and
the capacity to see the big picture).
Quality through various combinations
Official forensics work that may be used for law and order actions requires a high level of quality
and thoroughness. These criteria, one of the biggest difference between an enthusiast/AV analysis
of malware and malware forensics, indeed impact the design of the system. To provide the best
analysis possible according to these criteria, the exploitation of various sources of information, the
existence of the corresponding various forensics tools and the deployment of various targets with
various environment are a huge part of the solution.
Though mentioned before, those various combinations are mainly composed from: the combination of different approaches that will be built on various sources of information (static code analysis,
automated reverse-engineering, target system analysis, network communication analysis, live behaviours, etc.) and the matching monitoring and analytical tools (to manage the various sources
of information and extract as much information as possible while being able to understand it and
parse it), and the existence of various kinds of targets with different local systems and different
hardware and software environments in order to provide a more exhaustive collection of simulations.
4.3
The MALANET solution
At the BKA, the KI-42 unit in Berlin, in order to adapt to the evolution of computer crime, has
a sideline experiment to build and use an automated analysis system: MALANET. This system, in
continuous R&D, provides the services described before following similar designs (regarding to the
components, the question of thoroughness and quality, etc.). The implementation is done through
various regular and in-house technologies, although the inside details about the implementation are
limited due to the sensitive nature of this information.
CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED
21
One interesting point that was considered during MALANET design and that gives him an edge
in some parts of the analysis is the limit of the simulation. The quality of a simulation is indeed
linked to the intrinsic quality of each simulation component (e.g.: the virtualization system
for the virtual targets) but also to the extent of the simulation. If one would ask what is the
best simulation of the universe, the answer is actually the universe itself. But, hopefully for the
law and order, there is no need to go this far for malware forensics. Yet, MALANET goes further
than the simulation of targets by considering a simulation of the Internet.
The main components of a MALANET-like system
The rise of malicious software using network communication in their regular behaviour (in order
to leak data, to propagate orders inside a botnet, to provide distant access to a malicious actor,
to act as a proxy or a repository for an other malware, etc.) and the objectives of a law and
order forensics (oriented in finding other pieces of a malicious system, including other corrupted
machines and the human actors of the system, rather than just estimating the casualties on the
local system) place network communication at the centre of the malware forensic. To do so, the
crafting of a lookalike environment with a simulated connection to the Internet, providing with as
much fidelity as possible all kind of communication, would allow the malware to communicate (thus
providing a better source of information for the forensics process) while trying to trick the possible
system-awareness and anti-virtualization/anti-RE measures in the malicious software.
Chapter 5
Analysis environment: deploying virtual
targets
"Now I do not know whether it was then I dreamt I was a butterfly,
or whether I am now a butterfly dreaming I am a man." - Zhuangzi
5.1
Going virtual
Sandboxes: opting for virtual targets
Sandboxes are systems running inside a virtual environment, elements that can be managed from
the outside by the simulation supervisor. Being partitioned from the underlying local system, they
provide an isolated running environment (the quality of this isolation being one of the main criteria
for sandboxes evaluation). Due to this property, they excel in providing development environments,
test environments and also simulation environments. The anti-virus community rapidly exploited
sandboxes to provide runtime behavioural analysis, limiting the impact of various anti-AV functions in malware (like metamorphic/polymorphic engines allowing automated modification of the
executable when self-replicating, counteracting signature-based detection).
In the case of a more forensics-oriented analysis, they provide the same access to an information
source (live behaviour) as regular targets do. While at the same time being more easily deployed,
customized, restored, all thanks to the supervision level of the virtualization environment. Which
also allows for an easier automation of the system, in both cases of sending data (commands,
files, etc.) and of retrieving the monitored information. And there is also the obvious advantages of
virtualization, like cost limitations, resource sharing, fast deployment and cloning process, managed
snapshots, and so forth.
Nevertheless, sandboxes are no panacea. As said before the quality of a sandbox depends mainly
of the isolation from the local system but also of the likeliness of the simulated system. The
resemblance between a virtualised system and a real system only goes so far, a perfect simulation
has yet to be achieved for various reasons. Including undocumented behaviours of processors, bugs
linked to a particular driver or component, default state of some stacks at some precise moment,
and so forth. Plus the presence of the sandbox may be advertised by leaking data, specialized
22
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
23
drivers, dedicated behaviours from the os, etc. These anomalies are potential indicators of a virtual
environment, hence the existence of environment-aware malware.
The good thing is that, if the authenticity and the credibility of a virtual system have limitations,
so does the detection measures and the environment-awareness1 . Though hardening the virtualization system may be needed to avoid some basic anomalies. Even with those defaults taken into
account, the deployment and use of virtual targets inside MALANET will still provide a useful
service. Low-complexity malware will not be environment-aware or will easily be fooled, and with
the rise of low-complexity cases (mainly due to the democratisation of some code and to the rise of
malware markets, selling samples, crafters, packers, botnet accesses, etc.), virtual targets may provide a faster yet efficient enough way of automated analysis. Plus the existence of a supervisor and
the possibility, to some extent, of virtualised hardware allows for an easier automation, integration
into a bigger system, manipulation of the environment, and so forth.
Sandboxes: selecting the solutions
The existing solutions can be categorized using different criteria. Some sandboxes are meant
for manual study (as in regular reverse engineering) and others for automated study. In this case,
the efforts are focused on finding an already automated analysis solution. Apart from that,
the market can also be divided according to two criteria: whether the solution is a web service or
a local service, and whether the solution is free of charge or not. Regarding the cost, we should
favour free of charge solutions for obvious reasons. Regarding the type of solution, we should
avoid web services for two reasons. First, the malware may leak data about the original target and,
being sometimes highly tailored, it may spread out of the web service : because of the sensitive and
sometimes technically-complex nature of computer crime policework, there is a trust issue regarding
the confidentiality and security of online sandboxes. Second, we lack info on the workflow of web
services and control over its functions: we can not afford using a black-box solution nor loosing the
possibility to tune it and hardened it. That is why we should look for a local solution.
Knowing this, the number of legitimate solutions is quite limited. Though many web services exist
(like anubis or comodo instant malware analysis), only a few open-source/free solutions are available.
One of the most well-known solution is Cuckoo, but there is also Buster Sandbox Analyzer, Zerowine
and the Minibis solution. The capacity of running as a local service and the cost of it are strong
criteria, meant for exclusion in the choice process. To be able to analyse solutions in the lasting
group, the selection criteria are the following: access to the source, targeted systems, automation
capacity, easiness to be detected as a sandbox, accuracy of its results, hardening possibilities, latest
release and development activity, and content and format of the results.
Buster Sandbox Analyzer BSA is a one-man project who started in 2009 as an extension
to Sandboxie. Aimed at analyzing Windows malwares (32 and 64 bits), it can run different file
formats (exe, pdf, bat, url, etc.) and can be launched from the command line (making automation
easier to craft). Reports are written in an human-friendly way, categorized according to the type
of changes (registry, network, etc.). Nevertheless, BSA has a few defaults. Sandboxie, the main
dependency, is a licensed software with an annual fee. Nor Sandboxie nor BSA are open-sources
1 Yet
an other case of a cat and mouse game in security.
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
24
(giving less understanding and control over the process). And, since the last version of Sandboxie
and its acquisition by an other company, BSA is discontinued (since 2013). This, alas, clearly rules
out BSA as a serious long-term solution.
Zerowine Zerowine is an automated analysis sandbox based on Wine. It is delivered as source
code or as a compiled QEMU image. It shows possibility of automation (Python being used to
manage it) and does produce human-friendly reports with subsections (strings, headers, signatures).
The latest version offered an hardened version of Wine and Zerowine to further avoid detection and
augment unpacking capacities. One point should be take into account, Zerowine isn’t a real sandbox
in itself, more a controlled emulation environment. Having to manage detection of Wine, detection
of the monitoring tools and the underlying sandbox (deployed to turn it into a proper virtual target)
make this solution a little bit more fragile, at least out of the box. An other drawback is the lack
of updates for years (last release is around 2010-2011). The main developer is active (through his
site, and blog, about malware analysis2 ) but not on this project any more. A variation, Zerowine
Tryouts, was made by a Korean researcher to patch bugs and add some functionalities but hasn’t
also been updated in years. Yet it may be worth to test the solution (the vanilla one or the Tryouts
one) and maybe restart development/produce an extension.
Minibis Minibis is a solution offered by cert.at, the Austrian national CERT. It is a suite of
different tools and scripts provided for malware analysis with automation capacities. It started as a
local service inspired by Anubis to provide more control over the process and the reports. It seemed
like an early promising local solution of automated malware analysis, before Cuckoo and others.
The available version has not been updated since 2011 but cert.at made last year (2014-08-08) a
Twitter announcement of a new version of Minibis (v3). Though using the old version of Minibis
may not provide an edge compared to an updated Zerowine, one should keep an eye on the next
release of Minibis (and maybe contact the cert.at for more information and an early version).
Cuckoo Cuckoo is a recent but well-known sandbox made for automated malware analysis.
Started as a one-man project for the 2010 Google Summer of Code, it is now supported by a
team of developers and backed by a foundation (a sign of stability and futur releases). An opensource project, mainly in Python, it offers a supervisor tool and analytical scripts. Though the
solution is made to manage a sandbox, it is technologically independent and can be used on top
of various virtualization solutions (VMWare, VirtualBox, QEMU-KVM, etc.). It targets Windows
malware analysis while running on a Linux platform. The results are provided according to various
categories (static analysis, network communication, dumped files, etc.), while the report can be
encoded in various formats (JSON reports that will be archived in the backend database, HTML
versions, MAEC profiles3 ). Its popularity in the information security and virus enthusiasts world
not only provides accessible documentation but also a few extensions and reworks of it. As an example, not only did the Volatility team provide a plug-in to easier the interaction between Cuckoo and
their memory analysis system, but Tomer Teller went further by working on a system of dynamic
memory analysis which uses triggers (API calls, heavy mathematical functions,etc.) to capture
memory dump (as opposed to an end-of-run memory dump).
2 joxeankoret.com
3 Malware Attribute Enumeration and Characterization, or MAEC, a structured representation language to characterize malwares.
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
25
Nevertheless, though three sandboxes are seen as potential solution, each needing a different
amount of work for implementation, other possibilities for virtual targets inside MALANET exist.
Exclusion criteria could be rework to consider paying solutions, though this may be out of the
question for budget reasons. Further than that, an in-house development of a Wine solution (or
a similar Windows emulation solution) may provide an alternate solution. In any case, for the
analysis solution needed at least one of them to illustrate the use of virtual targets, the focus was
temporary put onto Cuckoo (although most of the work can be adapted to other solutions).
5.2
Implementing Cuckoo
In order to integrate Cuckoo inside MALANET and provide additional virtual targets to the
analysis system, the sandboxing solution must first be looked at in order to understand the various
pieces and deploy a correct working environment. Once the pieces are working together in a designed
environment, further steps can be taken in order to exploit its potential and strengthen it (which
will include a study of the existing plug-ins, an in-house reinforced version of some components,
and supplementary scripts made when confronted to wild samples).
.−−−−−−−−−−−−−−−−−.
| Cuckoo Sandbox ? |
|
OH NOES!
| \ \ ’ −.__. − ’
’−−−−−−−−−−−−−−−−−’ \\ / oo |−−.−−,−−,−−.
\\_. − ’ . _i__i__i_ . ’
" " " " " " " " "
Part of the Cuckoo official ASCII art
Cuckoo’s components and environment
Cuckoo is a collection of Python scripts working together to manage a virtualization system, start
and process an automated malware analysis, collect and format the results, plus a few utilities. The
internal algorithm has been constructed so each big step (the machinery management, the processing
of the analysis and the analysis data, the reporting, etc.) can be configured and completed with new
modules. Configuration files will mainly provide the practical information for the virtual machines
and the optional behaviours and variables of Cuckoo (analysis timeout, reschedule of failed process,
etc.).
This main script works with the help of various components, including: a monitoring DLL (cuckoomon.dll) that will be pushed inside the virtual target with the sample (the DLL is used to hook
into Windows processes, monitoring the sample and collecting the data) ; a Django web application
as a user-interface ; a backend database to manage previous analysis and archives.
With all these components, the Cuckoo system is at the same time having a few precise dependencies and asking for some particular environment. The host server for both the analysis system
and the virtualization system is a Debian-based system, deployed with a recent version of Python
and a Mongo database as the backend storage database (mainly for the JSON-formatted reports).
The Django-based web application is used in conjunction with a solid webstack made from nginx
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
26
and uwsgi. Various supervising deamons and services are managing the proper functioning of every
components. Regarding the virtualization system, two choices were kept, VirtualBox and QEMUKVM. In any case, a virtual network is connecting the host with the guests in order to collect the
monitored information, the network being run by the regular virtual network utilities of the chosen
virtualization systems.
Cuckoo’s components workflow
Albeit the fact that various virtualization systems exist, VirtualBox and QEMU-KVM are ones
of the few that match the criteria used to shortlist the available sandboxing solutions. Though
VMware is known for its stability and capacities, it quite fast enters a commercial logic that should
be avoided. Yet, VirtualBox and QEMU-KVM still provide the basic needs of a virtualization
system for Cuckoo while having different profiles (which, in consequence, makes deploying both of
them a strength). VirtualBox is a mix of open and close components while QEMU-KVM is fully
open-source, or almost. The first one is a casual yet robust and portable virtualization solution
while the other one is a solution using the potential of hardware-helped virtualization, asking for
specific processors and kernels. Studies and experiments made by security enthusiasts revealed
that in recent versions, VirtualBox was easily detected due to a huge number of leaked data and
glitches in the guests, while QEMU-KVM allows a close-copy of a real system. VirtualBox can be
easily managed to push faked hardware profile and information while QEMU-KVM, even if offering
similar possibilities, is more sensitive (asking for a more precise configuration). Both are offering
pros and cons, and a wise solution is, if possible, to implement both (while favouring one of them
for default analysis and keeping the other one as a supplementary analysis, in order to limit the
complexity of using the system for a regular user).
Each deployed virtualization system will provide various targets running under different systems
and/or with different software environments. A variety of systems is needed, for malwares target
various systems and each version of Windows reacts differently, a variety of software is needed, for
different malware may target different software suite. Indeed, a basic bundle of software is installed
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
27
on every target to provide a working and credible environment (an office suite, a pdf reader, a
browser, Flash plug-ins, etc.). Though all the targets are running some version of Windows, the
main targets are under Windows XP (which still has a huge market share, mainly in low-income
countries and inside old automatons), Windows Vista (which is still in use and has its own deficiencies and particularities, even with a similar kernel to other Windows versions) and Windows Seven
(providing registry and folder architectures differences from older Windows versions). Indeed, in
the list of priorities, Windows 8 is next, while server versions of Windows should also be considered.
Both 32- and 64-bits are represented, which may be useful even for 32-bits only malwares4 .
Strengthening and extending Cuckoo’s
Once Cuckoo is properly implemented, with a functioning environment at its disposition, comes
the question of seeing where the system can be extended and reinforced. Though the tool is nicely
done and accomplish its purpose, it does not provide an extensive post-treatment of information,
nor counter-measures against system-aware malwares, and so forth. Building on the knowledge
acquired through testing Cuckoo on wild samples from some parts of the Internet, a few points
became priorities in order to provide a more efficient and responsive Cuckoo system for MALANET
(note that, though similar thoughts would be thought for other sandboxing solutions, the implementations may differ widely). These points include: properly considering existing Cuckoo’s plug-ins,
strengthening the counter-detection measures through an in-house version of the monitoring DLL,
decoding canonical botnet communication, managing a batch of samples.
Cuckoo’s plug-ins The modular, (almost) fully configurable construction of Cuckoo allows for
easy implementation of plug-ins and complementary algorithms (whether they are for the monitoring part, analysis part, report part, all of the above). And, although Cuckoo is quite recent as a
stable tool, its popularity leads to the apparition of a few interesting plug-ins. A handful of basic
plug-ins are distributed by the community and can be downloaded using the community
utility (most of today’s plug-ins are signatures related to particular malicious behaviours: checking
VBox registry keys, particular Botnet sample, etc.). Other more developed extensions are made
by actors of the security scene, especially malware researchers and enthusiasts. Including, but not
limited to: an extension to easier the interaction with Volatility (should work with VirtualBox,
will have memory dump format issues with QEMU-KVM although future releases of Volatility and
KVM should fix that), a trigger-based memory dump system by Tomer Teller instead of the
end-of-analysis memory dump (allowing to capture temporary artifacts in memory and to build
a better timeline of the events, may not be stable enough with VirtualBox, is not stable enough
with QEMU-KVM), malwasm the reverse-engineering plug-in based on pintools brought by the
malware.lu people, zer0m0n a kernel-level monitoring DLL provided by conix security.
In-house version of cuckoomon DLL The monitoring DLL, cuckoomon, allows Cuckoo to collect information from hooks into Windows functions. Though the system may be bypassed by some
malwares, it can be considered as a reliable and resourceful source of information. Because malwares
will use system information to check for the presence of a debugger or a virtual environment, the
existence of hooks provides an other opportunity : to modify on-the-fly system information given
4 e.g.: 32-bits executables run on a 64-bits Windows will have a slightly different flow, with different processes,
and this was already used to produce privilege escalation.
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
28
to the malware to hide the presence of the virtual environment. The monitoring DLL can also act
as a informational man-in-the-middle, manipulating the calls caught in the monitoring process.
Cuckoomon hooking system
To implement this idea into the local Cuckoo system, a second monitoring DLL has been crafted5
and placed in the architecture. This version not only logs the information caught by the various
hooks, but also checks the data for particular strings (replacing them on-the-fly with inoffensive
strings, e.g. VBox by Acer, or crafting a distinct answer for the sample request, e.g. an error
answer for obvious virtualization-related registry keys). This interception of the data transfered
to the sample allows for a stronger simulation by hiding certain markers from environment-aware
detection measures. It does not provide the same level of environmental data manipulation as a
rootkit-like monitoring would provide (kernel-level and so forth), but will counteract easily basic
detection routines found in low-complexity malwares.
This counter-measure has been built as a proof-of-concept by observing other uses of the monitoring DLL to feed false data to the sample to avoid detection. Going further than denying
the existence of some registry keys by allowing manipulation of intercepted information (man-inthe-middle indeed) opens a world of possibilities, while always knowing it will not counter every
detection possibilities. The intercept, check and modify behaviour could be extended to the code
of the hooks themselves, allowing for a global censor of some keywords in the interactions with the
system (virtualization-linked keywords for example), and could be fed lists of sensitive keywords
(with default answers, profiles of answers matching brands of hardware, etc.).
Decoding botnet communications Samples released in the wild are less prone to raise suspicion if starting to communicate with their servers. So, for some of the tests of Cuckoo, parts
of the sample communication were forwarded into the Internet in order to precisely identify the
malware and study the options for botnet communication analysis. For the same reasons regular
applications are using protocols and normalized formating, botnets are hijacking regular protocols
(like IRC ou p2p file sharing algorithms) or crafting homebrew protocols. A homebrew protocol
allows for the insertion of the protocol inside a less malicious-looking canal, a casual example being botnet communications using HTTP Get and Post requests. The use of those elements like a
5 The
C code of the DLL being a Github project in itself, crafting a variation is child-play
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
29
regular HTTP connection provides some discretion for the unaware eyes while the data formated
and transmitted can be manipulated according to the needs and goals of the bot.
More and more, these communications have been strengthened using various security concepts, up
to the use of cryptography (symetrical, asymetrical, one-time pad, etc.). A common pattern is the
encoding of formated data in hexadecimal, that will be encrypted using RC4 and a known starting
key, the resulting data being sometimes checked using a small CRC. Note that every decisions in
securing botnet communication may render the bot more obvious and less prone to survive on a
regularly monitored/used station. By compiling knowledge about botnet communication protocols
and matching profiles with network analysis, not only a stronger automated detection of the nature
of the sample can be achieved, but the communication may also be entirely decrypted. Most of
the protocols has regular phases of key exchanges, with a root key being used to encrypt those
exchanges : the root key can usually be extracted using reverse-engineering, static and behavioral
analysis, or by checking it against default known values (the democratization of already crafted
tools brought variety in the security level of malwares).
Decoding botnet communication : results and script
These thoughts can be applied as an extension to a malware analysis system like Cuckoo by providing scripts parsing through the packet capture and local information. By searching for known
link patterns, data encoding and particular sockets, the nature of the botnet can be determined.
Thus, the exchanges can be targeted and automatically decoded trying default keys (or keys extracted from previous parts of the analysis) and known protocols. The data exchanged, if decoded,
can confirm the nature of the botnet but also define more properly the aim of the bot and give
context to other network communications (e.g. confirming an other network behaviour to be from
the malicious sample as a reaction to a control server order).
Managing a batch of samples A point of interest that appeared while analyzing and studying
wild botnet samples is the management of a group of samples. Although some precise, unique
sample may be fed to the system, monitoring of wild sources of malwares or data extracted in
the context of a case provide both with batches of samples. While wild sources will have multiple
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
30
malicious executables or indirectly infected files, a source from a case may have multiple executables
infected with the same malware or various malwares deployed as the part of an orchestrated attack.
Although using Cuckoo through the web application is somewhat user-friendly, it may not adapt
well to the scale of tens or hundreds of samples coming from the same source. Manually feeding the
samples is out of the equation, an automated solution can be crafted around the API. This will, at
the same time, provide a helpful script for sample batch analysis (progressively pushing the samples
through the network to the Cuckoo system, monitoring the analysis and downloading reports to the
local station), while initiating the implementation of the integration into MALANET (by properly
handling the API and transcribing the various possible requests into Python functions) and opening
considerations on constructing functionalities around batches of samples instead of unique samples.
For, indeed, if the local Cuckoo system is a stable, working tool, its potential has not been fully
exploited yet.
Batch managing components and processes
5.3
Virtual targets inside MALANET
Although a functional stand-alone Cuckoo system is provided, it must also become a part of
the overall MALANET analysis system. The same options should be offered by the various real
targets and virtual targets existing, but the already-provided analysis made by Cuckoo should
also get integrated into the analytical and reporting processes of MALANET. While the API and
the utilities provide external interaction with the system, the analysis reports must be handled
by MALANET processes and parsed into a secondary source of information. Indeed, the various
options and parameters of a Cuckoo analysis run should be accessible from the local Cuckoo system
and from the global MALANET system. The existence of the supplementary layer provides not
only constraints but also the opportunity to craft a system around and explore the full potential of
the virtualization-based malware analysis system.
The actual system can indeed be extended by more thorough works on its various components
and further development. The reinforced monitoring DLL providing on-the-fly data manipulation
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
31
is still more a proof-of-concept than an actual engineered tool. Further, switching between monitoring DLLs as well as switching between the various virtualization systems existing should be
managed through scripts to provide a more user-friendly experience. The various scripts developed
while experimenting with Cuckoo should be properly integrated into both local Cuckoo and global
MALANET options, including the management of a batch of samples or the network analysis of
botnet communication protocols. The various plug-ins should stay on a check-list, as an exemple
the memory dump analysis system being on hold for compatibility and stability reasons (reasons
that should evolve with future releases of Volatility and QEMU-KVM).
Nevertheless, all these upgrades are mostly updates of supplementary parts brought to Cuckoo.
The potential of the analysis system, coming from both the various elements and the combination of
the various layers, can be enhanced by constructing intelligence around the memory capacity of the
system. While the archiving system of samples and analyses provides a logistical answer to various
issues, it can also serve as a source of information for future analyses. Using a profiling engine fed
on older analysis and a data correlation engine for the analytical processes, a correlation system can
be build to provide various new abilities : regarding a same sample, an automated comparison of
the analyses in various systems (and, further, in various virtualization systems and on the long-run,
even further, between real and virtual targets) can provide the expected host systems for the sample
and its sensibility to and awareness for its environment ; regarding a batch of samples, correlation
between samples can clean further analyses of duplicate samples embedded in different elements
; regarding any analysis, correlation may help an investigator to spot common elements between
various samples by helping him to manipulate older reports as a source of information for its current
analysis. Considerations about existing normalisation of malwares, to build intermediate profiles
lighter than full-analysis reports, can be made, mainly around solutions like MAEC.
While working on integrating Cuckoo, refining its complementary components and extending
its potential through profiling, the simple botnet communication analysis can also grow. While it
started mainly as promptly crafted scripts automatizing the parsing of the pcap file and the decoding
of the exchanged data, it can benefit from a database of specific botnet communication patterns and
known protocols. If this will provide the necessary components to try to decode possible botnet
communications and to maintain the system, it may also find a more profound integration into
MALANET by providing patterns and usecases to the fake Internet connection. This provides
the unique opportunity to, if the sample communicates according to a recognized botnet protocol,
answer back and check if the malware has a further conditional payload and possible commands.
In a nutshell, to provide a fake C&C server using the detection and decoding system6 .
Although using Cuckoo through the web interface or the API provides a somewhat user-friendly
solution for basic functions, and full integration into MALANET should definitely assure that,
the use of it should stay nuanced. The technical user-friendliness of the solution, plus the online
and in-house documentation (in-house documentation with troubleshooting adapted to the local
system deployment), may mislay an investigator. In the same manner as a botnet automated
analysis may misunderstand communication for multiple reasons, the analysis produced by Cuckoo
is limited by its possible weaknesses. Although a positive, profile-matching analysis may be right,
a lack of proper answer is nothing of a positive absence of malware. Between options to bypass
6 And,
thanks to low-complexity default-key samples in the wild, this can have results for regular basic cases.
CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS
32
Cuckoo’s monitoring and environment-aware defensive routines, a malware may hide during the
analysis or even produce a fake payload to mislead entirely. This uncertainty not only lies on top of
metaphysical considerations about simulation-based analysis and data monitoring, but also on the
cat-and-mouse game between detection of and survival chances of a malware. As many subjects in
the security field, virus analysis, whether it is for anti-viral companies or forensics, is in a constant
state of evolution.
In order to end up on the at least understanding, and maybe winning (or regarded as so), side of
this game, one has not only to accommodate to those evolutions (which asks for modular tools with
continued development) but also to proactively look for weaknesses into its own analysis system in
order to outsmart its opponent. Which raises the problematic of properly and thoroughly analyzing
an analysis system while hunting for its limits.
Chapter 6
System introspection: crafting a test
sample factory
"They say a police is only as good as his informants." - The Wire
6.1
Analysing an analysis system
The building of a system depends on the capacity to debug and test it, while the continuous
amelioration depends on the capacity to probe its limits. Though the analysis system is live tested
on real targets, the feedback is limited to the few information caught in those processes. To achieve
a better system, with the knowledge of its flaws and limits, a more manageable feedback is
needed. Which can be achieved through the use of test samples.
By being controlled samples that will imitate possible behaviours of real malicious samples,
they will provide a reliable and manageable source of information. Because the most offensive or
destructive behaviours would not be needed to actually probe MALANET, the focus is more put on
environment-aware behaviours (e.g. virtual environment detection) and defensive routines
(e.g. stealth measures). While also providing basic inputs and outputs for casual behaviours, in
order to test the monitoring tools and forensics algorithms.
By adopting such a strategy, and relying on controlled insiders, some kind of system introspection1 will be provided as a source of information for future work on the analysis system. While
allowing for a potentially more exhaustive and contextualised analysis, the ability to precisely craft
those samples will influence the quality of this introspective analysis. How to provide on the longrun a solution for crafting those samples while adapting to the discovery of the system weakness
and potential evolution of the structure may find an answer in the crafting not of samples, but
of a sample factory.
1 To be honest, if the test samples were part of the automated intelligence of the system, one could say the
system is actually doing real introspection. It would first require a fuzzing process, the ability to note the presence
of errors/missing pieces in the results, the crafting of an adapted sample and the creation of an introspection report.
33
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
6.2
34
Crafting a sample factory
Sample factory: conception
Because the purpose is more about exploring the full potential of MALANET while hunting
small breaches than about basic functional tests, there is a need of crafting specific samples on
demand that can be met by providing a sample factory. The value of this factory would lie in its
user-friendliness and an easier use of it than the manual craft of new simples. It should be easily
manipulated and configured while providing access to most of the functions of a real sample.
In order to properly design the factory, the challenging parts of crafting a sample should
be identified and the difficulty of the tasks should be lowered through the mechanisms
of the factory. One of the main point is the language in itself: though malwares can be written
in various languages (usually compiled ones, like C, C++, .Net, VisualBasic, etc.), it already
asks for some familiarity with one of the language. Further than this, to achieve interesting
functions and complex behaviours (even more for complex malicious behaviour), a combination of
a good insight knowledge of the targeted operating system and of a trickster/con-man
mindset (the "smart attacker") is needed. Finally, without regarding the actual complexity of
these functions, some overly used routine functions should be easily added to a sample to gain
time through automation (asking only for the functional knowledge, at a higher abstraction level
than the technical knowledge).
To achieve these objectives, various components of the factory are clarifying. Indeed, the factory should have access to a library of already-coded functions, in order to reduce the user
investment into the developing process. This library would contains pieces of code, formatted for
the needs of the factory algorithm. To manage those pieces, while also reducing the complexity of
development, a language easier to manipulate for a less experimented user should be provided.
This language would achieve user-friendliness for casual users by placing itself nearer the high-level
functional description of the sample than a technical implementation oriented language. The factory in itself would be able to process this pseudo-language to assemble and adapt the various
pieces of code needed to craft the sample, ending if possible with the compilation of the code into an
executable (otherwise, the resulting code should still be accessible for later compilation). Indeed,
because the factory and its language would be a kind of novelty for any user, a basic but solid
documentation should be provided.
Sample factory: design of the pseudo-language
The first part of the implementation is the design of the pseudo-language that will be used to
describe the samples, providing some kind of configuration files for the factory. For this language
needs to be understand by the main algorithm, it needs to be parsable2 , which means it will rely on
some kind of structure and probably on flags or markers (this is a question of information theory
and the ability to distinguish pieces of information in a stream).
2 Though
both words parseable and parsable are accepted, it seems parsable is more legit.
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
35
Different languages exists to format data, one of the well-known example being XML. On the
other hand, the pseudo-language should have the look and feel of a programming language and
may be used to provide functionalities (pre- and post-treatment of the descriptive information).
Adapting something like XML so far away from its regular use may lead to a more complex and
primary less user-friendly result than expected. A solution is to see this as an opportunity to
create a homebrew language adapted to the needs. Though it does not assure the same quality
and stability as a regular programming language, because it is only a step in the process of the
factory (a configuration file used to switch from a functional description with some elements of
programming to an actual low-level programming language), the option is plausible.
The pseudo-language will help in manipulating the pieces of code by presenting them as a collection of functions, providing an ensemble of the selected instructions (functions) with their respective
input and output arguments (while trying to minimize the exposure to inside mechanisms and temporary input and output variables). For a friendly visualization of the content, each instruction
will be written on a new line with its arguments, according to the following pattern.
instruction_name[input arguments][output arguments]
The separation of each line using only the new line markers is quite weak (compared to a more
obvious ";") but manageable. The instruction name, due to the huge quantity of pieces and the
possibility to categorize them (according to the kind of functions: data manipulation, file manipulation, network communication, checks, etc.), will be split according to the category and the precise
function: category.function (e.g.: reg.writekey to write a key in a registry). This reminds some
programming patterns for structures or classes, providing a fast way to understand what kind of
behaviour is asked of the sample.
Once functions can be written down through instructions, rises the question of the arguments.
Having access to input and outputs arguments, to provide hardcoded known data (for inputs, e.g.:
paths, dumped files, keys) and redirect data between functions (for outputs, which may become
inputs of a following functions), is an obvious need. They will be included after the function name
between "[ " and "]", according to the previous pattern.
Arguments may be of various natures. If some are hardcoded strings of paths or registry values,
some may be optional values for API calls, raw code or injected instructions (a function may be the
input argument of an other function). Distinguishable markers are required to separate the various
types and treat them accordingly in the main script.
• Type: optional value / raw data. Marker: nothing. Example: CREATE_NO_WINDOW.
• Type: string. Marker: ", each side. Example: "VMware Tools".
• Type: instruction. Markers: ( and ). Example: (cli.print["Owned"][])3
3 Indeed,
injected instructions have their own arguments.
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
36
Though this covers the basic ground needed for a pseudo-language, with only a few markers (line
end between each instructions, and the five ones for data type), new functionalities in the pseudolanguage can be implemented with the addition of supplementary markers or keywords. One of
them is the ability to repeat an instruction while changing the content of an argument. Due to
the fair amount of values verification inside environment detection and defensive routines, some
instructions may be repeated a few times at the same point in the overall sample behaviour. An
interesting functionality would be the capacity to fuse those instructions into one with a list of the
value of the arguments4 . Which requires new markers, for the list of values itself and to split the
list into those values. Note that other local arguments would be repeated according to a modulo,
allowing for various size of values lists as different arguments for the same instruction.
• Use: outside borders of the list of values. Markers: { and }
• Use: inside borders of the list of values. Marker: ,.
• Example: {"sandbox","vm","malware","forensics","simulation",xoredsecretdata}5
Other interesting properties for the pseudo-language would be the ability to import already
written instructions from an other source (an other configuration file). On one hand, as stated
before, some routine instructions are quite common in malicious software and answers basic need
(e.g.: a melting routine using a batch file to delete the executable file, and the the batch file itself).
On the other hand, although the language tries to be at a functional level, some processes are still
split in a handful of functions to allow the user to precisely craft the behaviour they want. But, for
most of the users, the ability to manipulate the inside instructions is not required.
The property of importing already crafted list of instructions turns out to be two different properties: importing already crafted list, without modification (routines), and importing a group of
instructions with the first input and last output arguments (aliases). Routines are instructions
without arguments that will be matched to other instruction files and imported on-the-fly. They
will be indicated in the usual instruction pattern using routine as the category name and the file
name as the function name (so the instruction becomes routine.filename). Aliases are instruction
batches manipulated as regular instructions. A list of the aliases enabled and the matching batches
of instructions will be kept in a file (the aliases file6 , and they will be called using their aliases
name (which follows the usual instruction naming process). Depending on the priority of canonical
instructions over aliases, aliases may also be a way to overwrite definitions.
• Example of a routine: routine.melting[][] 7
• Example of an alias: netcom.HTTPGetFromHostname[srv,rsc,ua][buf,bufsize], replacing a
succession of netcom.HTTPGetRequest, netcom.sockethostname, netcom.HTTPsend, netcom.HTTPrecv
and netcom.closesocket.
4 Somewhat
5 Note
similar to factorization in mathematics, somewhat.
that a list does not imply any condition on the type of arguments inside, neither on a similar type for each
value.
6 Although, as a R&D thought, importing from secondary aliases files could be a configuration option
7 Even if the routines use no outside arguments, they still follow the main instruction writing pattern, hence the
empty [].
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
37
Finally as an anecdotal but useful functionality, commentaries should be allowed inside schematic
files (the list of instructions8 ). Commentaries allow indication and documentation inside the code,
which pave the way for sane coding habits. They also provide a fast on-off switch for an instruction
during a development phase. A new marker is needed to indicate that a line is a commentary and
not an actual instruction, ## will be used.
• Type: commentary. Markers: ##. Example: ## Ceci n’est pas un commentaire.
If the pseudo-language used to describe the sample behaviour is the main concern, the pieces of
code manipulated by the factory are also formatted. The precise markers will be adapted to the
design of the factory algorithm, but a structure of markers starting with # will be used to parse the
pieces. The main markers being the #DOC for the documentation header of each piece (which will
be used for documentation generation), #INVAR and #OTVAR for the inside mechanisms replacing
the local arguments inside the pieces by the arguments from the schematic, or the combination of
the #HEADER, the #MAIN and the #FOOTER to split the actual code between what will be
and what will not be repeated in case of a list of values as an argument. Those markers should
not enter in conflict with the pseudo-language but may be an issue for the code inside the pieces
(although a conflict has a really low natural probability to happen).
One point inside the pieces that may provoke a conflict with the schematic, which is where both
ensembles of information enter in contact, is the temporary names of the local arguments. In order
to inject and extract arguments according to a schematic, those arguments need a temporary name
inside the pieces. An arbitrary naming of IVAR and OVAR has been choose, with each temporary
name being assigned a number corresponding to the position in the argument list of the instruction
inside the schematic (e.g.: IVAR1 is the first input argument of the instruction). Both IVAR
and OVAR, and their numeral variations, should be regarded as reserved keywords. In the future,
things may evolved in complexity by the adding of default values for arguments (yet to be properly
designed and implemented).
The pseudo-language, although it may evolve in future development, can be considered sufficient
now. The resulting list of reserved markers, whether chars or keywords, is the following.
• Commentary in the schematic: ## le commentary
• Argument lists: [le arguments]
• String as an argument: "le string"
• Instruction as an argument: (le instruction)
• List of values as an argument: {le first value, le second value}
• Routine as a category name: routine.lefunction
• IVAR and OVAR as internal mechanism: le IVAR1, le IVAR2, le OVAR1
• Backslashing reserved chars: "\[le string \]"
8A
schematic, a list of instruction to manipulate and assemble pieces, makes sense for a factory.
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
38
Those markers were choose to provide an efficient description language with basic programming
logic and functionalities, while keeping it small and simple. In case of a special char being used in
a variable name or a string, the usual backslashing solution is enabled (e.g.: a string containing a
path would be "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Image File Execution
Options"), avoiding to many limitations in the writing of new schematics. A trick to limit the
quantity of reserved keywords is for example the use of an outside file for aliases while including
them naturally in the usual pattern (to transfer the information intelligence from the data itself to
the algorithm managing the data).
Sample factory: technical implementation
Once the pseudo-language has been established, the development of the factory is more about the
technical implementation and the engineering questions of translating the concepts into an efficient
tool. If the pieces of code will be in C, in order to provide a compiled language for the sample
(to in the end produce an executable with a minimum of dependencies), the main scripts will
be in Python (for easiness of development and personal reasons).
The factory is built around a main python script, the Architect. The Architect is the core
component of the factory, able to read the schematics (list of instructions in the pseudo-language
acting as a compromise between a configuration file and a code file), pick and manipulate the
pieces according to the schematics, crafting a .c code file and asking a compiler to produce
the executable. The Architect has its own configuration file, which contains names of folder (for
the various components) , the command and the options for the compiler, pre- and post-treatment
of the information (whether it is the schematic, the code itself, etc.), and so forth.
The Architect components and workflow
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
39
The core of the Architect is a parser, built according to the structure of the pseudo-language,
following a workflow of successive inclusions to avoid conflict. Strings will be detected to look for
reserved chars, that will be temporary rewrite as an explicit hexadecimal value of themselves (in
extended ASCII), to avoid them being caught by the next steps of the parser. And so forth, from
strings to list then to arguments and instructions. Some step of the parsing are self-invoking to
manage the possibility of embedded arguments inside an instruction as an argument, and so ad lib.
Other components of the factory includes the pieces folder, the schematic folder, the result folder,
the configuration file for the Architect itself and some utilities. The pieces folder is the library of
available pieces, formatted according to the pre-define structure (#DOC, #INVAR, etc.). The files
are named using the category name and function name system, although according to this pattern
category_function (instead of a dot as a separator). The extension is .piece. The aliases file can also
be found under the pieces folder. The schemas folder contains the various schematics, written in
the pseudo-language, with a .schematic extension. Finally the results folder is the home of the
various C code produce, and the corresponding executables if compilation succeeded.
Finally the Architect also has a few utilities at its disposition (and its users disposition).
To answer the need of a documentation for the new pseudo-language and the existing pieces,
a documentation generator utility (docgen.py) will build a file text with the various #DOC
header of the pieces and a few coding guidelines for the pseudo-language. To easier the crafting of
new samples and the user of the xOR instruction (datamanip.xORarray), an script producing the
hexadecimal value of an xOR between a given key and a given value exists (xorarray.py)9 .
Those three utilities are examples. Others may be developed to keep providing the user of the
Architect with an easy sample crafting tool full of options. Speaking of which, further than utilities,
the Architect will also provide with pre- and post-treatments of information. The use of a factory
in Python and the manipulation of the pseudo-language allow for various automated processes that
would provide better samples without asking more efforts from the user. For examples, junk code
can be automatically added to the sample (whether directly in the C code or through junk
instructions), or variable names in the schematics may be automatically randomized. All
of which in order to provide a, at least, more efficient, faster, more subtle or more compact sample.
The automated treatment would be options indicated in the schematics, checked by the Architect
before parsing the instruction list.
In the end, after the definition of the language and the construction of the Architect system,
the pieces folder has to be populated to provide at least basic functions for test samples. Basic
behaviours are related to data manipulation and simple inputs/outputs functions for common types
of data, functions that can served as basic elements for more evolved behaviours. If the test samples
are not delivered with a malicious payload, placing some functions outside the range of pieces
development, they still should provide other malware components like a self-replication engine, a
network communication engine and various stealth and defensive routines (including environmentaware functions, crucial to test a simulation-based analysis).
9 It allows the use of xored hardcoded value in the sample, limiting the information available from a simple static
analysis.
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
40
Although development of pieces is a work-in-progress, basic categories has already been defined
and populated. To manipulate various type of data (read, write, expand, delete, etc.), elementary
categories have been created : file, reg for registry keys (including the difference between a key and
a value), mutex, process (various injection techniques yet to be properly implemented), sysvar
(environment variables, Windows version, local executable full path, etc.). Particular data manipulation is covered by datamanip (xOR, concatenate strings, split HTTP header and content, etc.),
network connection and socket functions by netcom (socket using IP address or hostname, opening, sending, receiving, closing, GET and POST request formating), basic interface manipulation
by cli (show, hide, print).
For more conditional behaviours, in order to build reactions around environment-awareness capacities, the category if is being developed. Though crude for the moment, with an instruction per
kind of if (value of registry key, username, presence of debugger, mouse movement, existence of a
file, etc.), getting reactions through input arguments and with plain repetitions in case of multiple
checks, it still already provide opportunities for multipath behaviours. To still provide every possibilities of behaviour while furnishing properly crafted instruction, a categoryless instruction direct
allows to push raw C code through the pseudo-language. In a similar fashion, a categoryless loop
allows to push raw code or regular arguments (including other instructions) inside a for loop.
Regarding work-in-progress categories, three main categories will complete missing crucial elements for malicious software. The random category will provide basic random functions (integer,
boolean, uid, etc.) for fuzzing, stealth and cryptographic functions (and also the interesting strategical move of launching the real payload only once in X executions). The gui category will provide
further interface instructions than the simple cli, building on Windows visual interface functions or
directly on OpenGL. The junk category will provide neutral instructions that will not modify the
core behaviour but will provide useless code to disturb reverse-engineering and behavioural analysis
(automatically injecting junk code may be a future information treatment offered in the Architect
configuration). RC4 will serve as a foundation for the crypto category, a category that will work in
coordination with the datamanip category (xOR manipulation may switch category for the crypto
one, CRC instructions will join one of them).
Some aliases and routines has already been written, starting the production of more complex
yet casual elements of malwares. Two aliases has been made to cover two regular network
operation, get data and post data to an url (netcom.HTTPGetFromHostname and netcom.HTTPPostToHostname respectively). They avoid socket consideration for the schematic
writer, covering the whole connecting and crafting of a HTTP request parts. Depending of the
evolution of the if category and check, new aliases may cover the new structure of conditional verifications and reactions. Future aliases may also cover encoding/cyphering and decoding/decyphering
operations.
Regarding routines, one common routine is already implemented : melting. Melting is the
self-deletion of the malware executable (and dropped files in case of advance melting), to clean the
malicious tracks. A regular trick, for .exe are not allowed to delete their own file, is to write down a
.bat file and use it to delete both, for .bat are allowed to. The routine covers the writing of the file
and kickstarting of it in a windowless console. In development routines include a simple self-copy
one and an other, that may instead turn into a combination of aliases, the unpacking of an xORed
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
41
embedded file (and maybe kickstarting of it, reproducing the full unpacking routine of malicious
payload).
Various schematics are being produce, evolving (in quality and quantity) along the factory
prototype and the available pieces. A few schematics try to imitate an existing code or extract of
code, to illustrate the capacities of the factory while providing emblematic samples. It includes an
imitation of pafish, the paranoid fish, an environment-awareness test executable aimed at detecting
sandboxes, and a copy of a botnet first communication from a recent version of Andromeda.
Following the Matrix-themed naming of this project (hence the Architect), and along the guidelines
of the two-step development idea (see next subsection), Anderson the unaware sample and Neo
the aware sample are being produce (the first providing basic I/O tests while the second is more
focused on playing the system). TheKid is an intermediate sample, used not to test the analysis
system but to explore the possibilities of the sample factory (finding himself between the unaware
state of Anderson and the fully-aware state of Neo, hence the Kid). Smith is a work-in-progress
sample focused on propagation with an aggressive parasitic approach (hence Smith) to provide test
samples for infection analysis system (an other paradigm in malware research than AV or forensics).
Sample factory: the two-step development
The introspection process into MALANET can be kickstarted once a few samples have been
crafted using the factory. Nevertheless, taking into account the needs and actual deadline leads
to start developing simple samples before achieving the first stable fully-functional prototype of
the factory. Hence, two steps has been determined using the capacities of a sample as the current
objective. The first step will be to provide a non-malicious sample that has no notion of systemawareness. With basic capacities for data manipulation and regular behaviours, it will serve for
debugging and fine tunning purposes. Being unaware of the system around him, he is Anderson.
Once basic introspection has been brought to the system through this possibility, the focus can shift
to upgrading the factory and the pieces library to answer the next step. The second archetypal
sample is aware of the system around him, and, building on basic functionalities, will try to hide
from the system and exploit its weaknesses. Being aware of the system around him, he is
Neo.
These two steps will help building the factory, providing a somewhat fluid development, with
increasing complexity and switching from simple instructions to more complex routines over time.
The two different goals should also bring all the questions the pseudo-language should answer
(creation of routine, conditional behaviour, global variables shared between instructions, etc.).
Beyond R&D support, it will also provide two solid starting points for users to craft new samples
according to their needs (whether it is functional tests or weakness hunt).
Further development starts with a new archetypal sample, Smith, focused on local and network
propagation and file infection. Though it may be useful for other simulations, less probable regarding analysis system introspection, it will also provide new challenge to polish the Architect.
Further than this phase of development, once the prototype of the Architect is efficient enough to
build interesting versions of Anderson and Neo, focus can be put on the Architect itself. The tool
can be extended to provide visualization of schematics linked to the knowledge of the various steps
and transmission of variables, allowing the user to easily see the workflow of the sample. A GUI can
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
42
be constructed to provide user-friendly configuration, including the choice of the current schematic.
It may be extended to also provide a sort of IDE for schematic, aliases and pieces writing. The
construction of samples can also be updated to provide an automated report dropped by the sample
once a run is done (no more instruction, crash, timeout, etc.) that could be analyze automatically
to point out weakness10 . An interesting approach to continue the R&D of the factory system,
without burdening the local agency, and while distributing the benefits to more people (from other
agencies, AV, security researchers, enthusiasts, etc.), is to turn the Architect into an open-source
project.
6.3
Sharing the good stuff
Though this may be bold, an unusual request should be considered about the architect. The
factory system could be open-sourced and published on Github. A few point should be taken into
consideration beforehand.
Pragmatically speaking, open-sourcing it is as much about the promise of future development
as it is about the consequences of releasing a tool in the wild. On regards of the development
question, open-source is obviously a strategical choice: it will provide a small probability of further
developing (while at the same time letting an easy access to the updated versions) and a bigger
probability of forking and development of new tools/integration into other systems. Regarding to
the release of a tool in the wild, the Architect in itself is a factory and not a malicious program.
In the same way everything can have multiple uses, and one does not incriminate the tool but the
user, the Architect is not an incentive to produce malicious code nor a complete and finished virus
generator. As an example, it has also the potential to help other actors needing non-malicious test
samples or to provide a proof-of-concept to programming enthusiast.
Indeed as an open-source factory it will provide a tool to script-kiddies (yet without providing
them with a fully operational and malicious program), but its real power lies in expanding it or
crafting on top of it. Someone that would have the skill and motivation to provide this work
would have an above average level compared to the script-kiddies and would probably not need
the factory to craft its own malicious program. So instead of focusing on the script-kiddie issue,
one could focus on what it brings to other researchers (including other agencies, private companies,
enthusiasts). It provides more than a factory, it provides a humble proof-of-concept of a higher
language for development and an argument for user-friendly programming while expanding the
potential of autonomous and system-aware agents (which, from cellular automaton to artificial
sapiens, is always an interesting domain to dig in as far as science goes).
Regarding security and sensitive issues, first, it does not provide confidential insights on the
MALANET system or other in-house works. Though it is used for testing purpose, not providing
precisely designed and targeted schematics will avoid providing sensitive information about the
local system (schematics can still be furnished, as some anti-virtualization routines can be easily
found on the Internet and mimicking of already existing malwares is an option). Which also leads
to the second point: while the core of the factory is released, a precise choice can be made about
10 With a supplementary step of the factory crafting a new schematic and matching sample according to spotted
weaknesses, the introspection could not only be automated but self-focusing and self-improving.
CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY
43
what component should be released or not (hence, some pieces may be let out of the open-sourcing
process). This decreased the danger carried by the tool while still providing an interesting proofof-concept and a departure point for any enthusiast.
An interested party could raise the question about intellectual property and money. For the
crafting of this tool was indirectly paid through its developer. It should be noted that the intelligence lies not only in the tool but in the schematics and the work on the introspection process of
MALANET. Further than this, there is the question of the pros and cons of the potential further
development versus the intellectual property ; and there is the question of the enthusiasm for the
scientific field. Indeed, though it can be useful to others and it is up to the agency to decide if its
governmental fund should serve only itself or also provide others, this is also a way to finance decentralized scientific research (in its smallest and humblest form) and to pay respect to the technical
field (as viruses are more than malicious programs).
Finally, as a more personal argument, the factory is a point-of-interest that will surely be worked
on again. Further than the artisan challenge of crafting and caring for such a system, its links to
information questions (how to properly craft a language, how to manipulate pieces to create an
ensemble) and to autonomous agents (here the virus is more seen, studied and used as a systemaware agent than as a malicious agent) makes it a worthy experiment. Though it stays a humble
proof-of-concept, it could still provide the foundation or the basic idea for a more complex system.
Chapter 7
Law and technological (dis)Order
The caterpillar said,
"One side will make you grow bigger and the other side will make you grow smaller"
"One side of what? The other side of what?" thought Alice to herself.
"Of the mushroom," said the caterpillar.
Alice looked at the mushroom, trying to make out which were the two sides of it,
as it was perfectly round.
Alice’s Adventures in Wonderland
7.1
Computer crime and computer police, a brave new world
Technology is one of the strength of the human people, allowing us to go further than the limits of
our natural evolution by crafting theories to understand our environment and tools to manipulate it.
If sometimes the cost of it may render it unwanted, a bigger problematic lies in versatile technologies
with potential nefarious impacts (like the dual-use technologies). Although IT was never out of
consideration, due to links between communications and ideas transmission, it became central in
the era of modern IT and computer-supported crimes. In a similar manner to applied physics
bringing stone bullets and atomic bombs, IT brought data stealing spyware and malicious parasitic
viruses.
Having to continuously serve and protect society, law and order agencies had to cope with the
fast technological evolution, and the world of possibilities opened by long-distance interconnected
networks and automated data-processing machines. If the first cases of computer crimes, when the
laws themselves were still virgin of those problematics, were linked to enthusiasts pushing the limits
further, no one can deny any more the presence of the criminal elements. For early adopters of
practical innovations can be easily found in criminals, they were amongst the first to be concerned
by the technological democratisation.
Nevertheless, if modern IT changed the tooling and behaviours of criminals, offering new malicious
possibilities, it also provided the law enforcement agencies with new tools to approach those new
behaviours. The same piece of code, a part of both side adaptation to a new environment. With
the potential to go further than matching the components of the other side, building on top of IT
44
CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER
45
technologies allows for a more insightful and intelligent comprehension and investigation of crime
for technology-related elements.
7.2
Ex machina: the assistant in the machine
Since the computer, the broad comparison between modern IT and the mechanical revolution
of the late industrial phase has gained weight : where the mechanical automates once started to
replace humans, digital automates can now be found. Providing an unexpected extension of the
tools available to society, automated digital systems proved themselves useful for data monitoring
and manipulation. Evolving from managing static data with calculations to observing live data
with forensics, automated analysis started to integrate simulation-based solutions : by providing a
controlled environment, the behaviour of executables can be more thoroughly examined.
When delinquents and criminals started to use live data for their purposes, leading to the concept
of malwares, the help provided by automated systems and the quality provided by simulation-based
analysis revealed themselves crucial elements for law enforcement agencies. With the recent rise in
low-complexity cases, due to underground markets and democratisation of dual-use tools, the choice
of IT-assisted police is one of the answer with minimal cost and dedicated workforce. It is in this
tendency that the KI-42 unit of the BKA came with the MALANET experiment. This evolution
through script-based automation of regular forensics and investigation tasks provided additional
potential through synergy between the various components and correlation of various information
sources. By seeing the whole system when manipulating even the humblest component, it is possible
to tap in this potential to bring intelligence in design to the system.
Nevertheless, if the integration in the frame of modern law and order agencies seems perfect, these
systems need to be handle with care when delivering results. Indeed, every automated interpretation
may miss an obscure point, asking to stay critical in front of forensics tools, but a simulation-based
analysis is bound to imperfection. Simulations will never achieve perfection due to their limited
extent, but also due to a combination of undocumented behaviours and errors, leaving breaches
and glitches to be found. For their advantages are still interesting, the automated system needs
to be analyzed and tested by fire to reinforce it. In a mindset similar to pentesting or fuzzing,
introspective analysis based on controlled test samples may provide the necessary information.
7.3
Viruses, beyond good and evil
For local automated analysis systems are oriented towards malwares and must confront potentially
environment-aware software, endangering the simulation-based analysis through its flaws, an inside
analysis can be constructed on top of controlled test malwares. By developing non-destructive
malwares that can be pinpointed towards specific points of interest, an adapted analysis can reveal
the practical limitations and breaches in a system.
Though the same code was once called malicious, this case is a concrete example of the relativity
of this definition. By employing them for a different goal, even with similar technical behaviours,
malwares focused on stealth routines and environment-awareness become a tool for law enforcement
continuous update of IT forensics solutions. Nevertheless, this specific need of malwares is linked to
CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER
46
their use for criminal activities in the first place. But, by embracing the possibilities of legit uses,
further applications for scientific research and engineering reveal themselves.
Whether it is for the propagation abilities with or without parasitic capacities, the evolutionary
quest of change and adaptation to optimize survival chances, or the out-of-box mindset providing
ways to manipulate a system in an unexpected direction, so-called malwares provide topics worth
thinking. Various sciences of the large IT field can be connected to them, from cellular automata,
regarding reactions to environmental variables for reproduction purposes and probability of future
patterns, to artificial sapiens, regarding awareness and evolution processes. Engineered applications may rise from self-replication researches, regarding data sharing and synchronisation, or from
metamorphic and polymorphic algorithms, regarding practical adaptive code.
7.4
Technological evolution and societal evolution
Although this paper started as a purely scientific and technical one, both the subject (malicious
software) and the work environment (law enforcement agency) led to considerations about the links
between a behavioural definition and the casual approach to the subject, about, as a consequence,
the wasted potential regarding research and development. If automated systems, simulation-based
analyses and the introspective angle provided this computer science paper with a humanity coloration, trying to detach from the practical view of IT, the path led to a consideration about the
integration of a technology into society.
And what a better background than the criminal and law enforcement one to speak about society
and technology misuses. Though the label malicious software has been widely adopted, debating
it led to pondering about putting the blame on technology instead of human actors, creating
intellectual inertia. If an accepted technology began to be integrated by society, it will indeed be
accessible to enthusiasts, researchers, law enforcement and criminal elements, for all of them are
parts of society. Nevertheless, the initial technology can not be considered as intrinsically good or
bad.
The actual problematic may seem secondary at best, ill-conceived maybe, but with the advances
in global IT in modern times and the further integration into society casual behaviours and human
actions and interactions, it may provide insightful answers. And the questions about technology are,
indeed, nothing new. But some issues should be confronted, issues that may provide more answers
than the question of control and censorship : about the level of control we deploy and the impact
on innovation of criminalisation and taboo, about the apparent easiness of addressing technology
issues through control instead of addressing human issues through communication, about digital
organisms with once controlled evolution but free in future behaviour, about the fear of loosing
control of inside processes in the digital world. All of the above find itself at an intersection
between sciences and humanities, engineering and ethics. But the final issue may be the question of
developing a mature stance in front of technology inside society, not only in regards of innovation,
but also in regards of fearmongering and exclusivity of knowledge (both critical debates nowadays,
with the cybersecurity trend, the terrorism topic and the intellectual property question).
CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER
47
On a lighter and more in context note, changing the usual considerations and extending approaches to a less orthodox thinking may lead to the proper integration of the enthusiasts community into the mainstream scientific R&D (nota bene : integration differs from assimilation). It
would provide a leap forward in knowledge and innovation, by putting in contact various researches
made with different mindsets and under different conditions, while also opening the sometimes
rigid universitary scientific system to new approaches of innovative experimental work and to new
definitions of boundaries and criticisms.
Technology is intrinsically neither good nor bad, and can always be used for destructive purposes.
For our actual stance leads to controlling scientific researches and to banning engineering topics,
we are damaging our own innovation and problem-solving capacities. Hence, the question is not
should we fear technology, living in the shadows of past nuclear wars and future artificial gods, for
we would just abandon it1 ; but how to build a working society that could live with that without
constant pressure and control. And the answer definitely lies more in society itself than in code.
1 And it would be a bad strategical move for a specie who switched from biological evolution to technological
evolution.
Appendix A
Extracts from the Cuckoo guide
Use : webpanel details
Using the webpanel, the sample can be pushed via the submit menu, the reports can be accessed
via the "recent" menu (or, for older reports, with /analysis/id, id being the numerical id of the
analysis). The reports are formatted according to the various analysis : static, behavioural, network
(plus all the small things like the dropped files, the archive inside mongod, etc.). The first page
of the report is somewhat a summary, with the hosts contacted, the registry keys and the files
accessed/manipulated, the mutex created.
In a report, the "Quick Overview" provides Cuckoo’s log of the analysis (that can be unwrapped
using the "Show log" button), a summary of the hosts and domains contacted and a summary of
the files, registry keys and mutexes asked for. Note that those variables may have been looked
for, created and/or used (so a file listed under "Files" may have been looked for, using a OpenFile
function which failed, because the file was not here in the first place). [...]
Maintenance : updating
Updating some or all the parts of the system will probably break the system. First because
updating the virtualisation system will probably break the current snapshot and ask for a new
one (please refer to the troubleshooting section or the cheatsheet appendix for more information).
Second, because the Cuckoo system is a delicate harmony between a virtualization system, Python
scripts, databases, web services (Django, webserver stack, API), and a proper virtual network
configuration. Indeed, the same goes for possible plug-ins, extension, integration into a bigger
system, etc. Overall compatibility, with this high number and level of interactions, is a critical
piece of the puzzle. Updating is still a good idea, but it should be taking into account the time to
reconfigure/reconstruct the system. [...]
48
APPENDIX A. EXTRACTS FROM THE CUCKOO GUIDE
49
Troubleshooting : can not load the machine and/or the snapshot
In case Cuckoo can not load the virtual machine or the snapshot, the source of error may be
a problem inside the virtualisation system or a problem with the current snapshot. If
a virtualisation system is updated, modified, reinstalled, there is a huge chance that though the
machine is still working the snapshot isn’t any more. To check the machine state, one can access
the system using [...] through the network (require ssh-askpass on the local station, connecting to
the distant IP of the cuckoo server, using the regular user account). If the machine is booting and
running without issues, it is probably the snapshot. [...]
Troubleshooting : empty analysis, no error in the log
An other source of error is the lack of proper network connection between the guest
and the host. There is two possible way to detect this error : Cuckoo will not start the analysis
because it can’t plug into the network interface (for tcpdump purpose and all) or Cuckoo will run
the analysis but no information will be captured by Cuckoo and the report will be blank. The
second case, most common one, happens because the machine is available (so it will be launched)
but the IP/interface is wrong and Cuckoo will not be able to communicate with agent.py. This
error can be confirmed looking at the pcap of the analysis or by watching the local (on the server)
packet capture during the analysis.
If something similar happens, one should access the machine (using [...]) and check the local
IP. If no IP is attributed, an ipconfig /renew may do the trick (in case the DHCP attribution was
broken at some point), and a new snapshot may have to be taken. If an IP is present (and one
may have to check the IP inside the snapshot and not the regular running state in case the DHCP
lease changed, but the probability is small), one should compare it to the IP written down in
the configuration file for the machinery (e.g. [...]). There is also the possibility of having an IP
problem on the host side : the local IP for the virtual network ([...]) is given in the configuration
files cuckoo.conf and [...]. One should check they match the reality of the local IP and network
interface configuration in case the communication is broken. Last but not least, it may be the
virtual network or its interfaces that aren’t working anymore : Cuckoo needs a working network
between the host and the guest. [...]
Appendix B
Cuckoo API and Python : examples
Pushing a sample
answer = r e q u e s t s . p o s t ( s t r ( cuckoo_api)+ " / t a s k s / c r e a t e / f i l e " ,
data={ ’ machine ’ : m} ,
f i l e s ={ ’ f i l e ’ : open ( o s . path . j o i n ( s t r ( s a m p l e s _ f o l d e r ) , s t r ( sample ) ) , ’ rb ’ ) } )
Checking an analysis status
answer = r e q u e s t s . g e t ( s t r ( cuckoo_api)+ " / t a s k s / view / "+s t r ( sample_id ) )
i f answer . s t a t u s _ c o d e == r e q u e s t s . c o d e s . ok :
t d a t a = s t r ( answer . j s o n [ u ’ t a s k ’ ] [ u ’ s t a t u s ’ ] )
Downloading a report
answer = r e q u e s t s . g e t ( s t r ( cuckoo_api)+ " / t a s k s / r e p o r t / "+s t r ( sample_id ) )
i f answer . s t a t u s _ c o d e == r e q u e s t s . c o d e s . ok :
with open ( s t r ( r e p o r t s _ f o l d e r )+ " / "+s t r ( sample_id)+ " . r e p o r t " , "w" ) a s t f i l e :
j s o n . dump( answer . j s o n , t f i l e , i n d e n t =4)
50
Appendix C
Decoding botnet message : examples
Decoding RC4 encrypted message
def keystream_gen ( key , k e y s t r e a m _ s i z e ) :
KS = [ ]
S = range ( 2 5 6 )
j = 0
f o r i in range ( 2 5 6 ) :
j = ( j + S [ i ] + ord ( key [ i%len ( key ) ] ) ) % 256
S[ i ] , S[ j ] = S[ j ] , S[ i ]
i = 0
j = 0
f o r x in range ( k e y s t r e a m _ s i z e ) :
i = ( i +1)%256
j = ( j+S [ i ])%256
S[ i ] , S[ j ] = S[ j ] , S[ i ]
KS += [ S [ ( S [ i ]+S [ j ] ) % 2 5 6 ] ]
return KS
def decode_message ( key , data ) :
odata = [ ]
f o r i in data :
odata += [ ord ( i ) ]
keystream = keystream_gen ( key , len ( odata ) )
decdata = [ ]
f o r i in range ( len ( odata ) ) :
d e c d a t a += [ chr ( odata [ i ] ^ keystream [ i ] ) ]
return ’ ’ . j o i n ( d e c d a t a )
51
APPENDIX C. DECODING BOTNET MESSAGE : EXAMPLES
Decoding ROT+xOR message
def decode ( i n s t r i n g , key , r o t ) :
t2 = [ ] , t5 = [ ]
f o r c in i n s t r i n g :
t 1 = ord ( c ) − r o t
t 2 += [ hex ( t 1 ) [ 2 : ] ]
t3 = b i n a s c i i . u n h e x l i f y ( " " . j o i n ( t2 ) )
f o r i , c in enumerate ( t 3 ) :
t 4 = key [ i % len ( key ) ]
t 5 += [ chr ( ord ( c ) ^ ord ( t 4 ) ) ]
return " " . j o i n ( t 5 )
Parsing Andromeda server orders
def t r a n s l a t e _ s r v m e s s a g e ( hexdata ) :
r e c v i d = hexdata [ : 8 ] , t r e c v i d = " " , pos = 8 , moar_cmd = True
f o r i in range ( len ( r e c v i d ) / 2 ) :
t r e c v i d = r e c v i d [ i ∗ 2 : ( i +1)∗2]+ t r e c v i d
print " Tag_RecvID : "+s t r ( t r e c v i d )
print "Cmd : "+s t r ( hexdata [ pos : pos +2])
while moar_cmd :
i f len ( hexdata)>pos :
t i d = hexdata [ pos +2: pos +10] , t t i d = " "
f o r i in range ( len ( t i d ) / 2 ) :
t t i d = t i d [ i ∗ 2 : ( i +1)∗2]+ t t i d
print " t i d : "+s t r ( t t i d )
pos = pos+10
i f len ( hexdata )>pos :
t = ""
while hexdata [ pos : pos +2] != " 00 " :
t += hexdata [ pos : pos + 2 ] . decode ( " hex " )
pos += 2
print t
pos += 2
else :
moar_cmd = F a l s e
else :
moar_cmd = F a l s e
52
Appendix D
The Architect : examples
mutex_create.piece
#DOC
C r e a t e a mutex ( no e r r o r management )
In v a r s : mutex name (IVAR1)
Out v a r s : mutex h a n d l e r (OVAR1)
#INVAR
IVAR1
#OTVAR
OVAR1
#INCLUDES
<windows . h>
#VARS
HANDLE OVAR1;
LPCSTR MName;
#HEADER
#MAIN
MName = IVAR1 ;
OVAR1 = CreateMutex (NULL, TRUE, MName ) ;
#FOOTER
53
APPENDIX D. THE ARCHITECT : EXAMPLES
54
if_regstr.piece
#DOC
Checking r e g i s t r y key v a l u e data
In v a r s : r e g i s t r y key f u l l path (IVAR1 ) , v a l u e t o query name (IVAR2 ) ,
v a l u e t o compare (IVAR3 ) , match r e a c t i o n (IVAR4 ) , o t h e r r e a c t i o n (IVAR5)
#INVAR
IVAR1 , IVAR2 , IVAR3 , IVAR4 , IVAR5
#OTVAR
#INCLUDES
<windows . h>
#VARS
HKEY rK , DWORD dwType , DWORD dwDataSize , char ∗ r g d a t a
#HEADER
r g d a t a=NULL;
dwType=0;
dwDataSize =0;
RegOpenKeyEx (HKEY_LOCAL_MACHINE, TEXT(IVAR1 ) , 0 , KEY_QUERY_VALUE, &rK ) ;
RegQueryValueEx ( rK , TEXT(IVAR2 ) , NULL, &dwType , NULL, &dwDataSize ) ;
r g d a t a = ( char ∗ ) m a l l o c ( dwDataSize +1);
RegQueryValueEx ( rK , TEXT(IVAR2 ) , NULL, &dwType , rgdata , &dwDataSize ) ;
r g d a t a [ dwDataSize / s i z e o f (TCHAR) ] = TEXT( ’ \0 ’ ) ;
#MAIN
i f ( strcmp ( rgdata , IVAR3) == 0 )
{
IVAR4 ;
}
else
{
IVAR5 ;
}
#FOOTER
f r e e ( rgdata ) ;
RegCloseKey ( rK ) ;
APPENDIX D. THE ARCHITECT : EXAMPLES
55
melting.schematic
#SAMPLE
name [ m e l t i n g ]
v e r s i o n [ beta ]
#PIECES
s y s v a r . exepath [ ] [ l p a t h ]
datamanip . c a t s t r [ " d e l /F /Q " , l p a t h ] [ b s t r ]
datamanip . c a t s t r [ b s t r , " \ n d e l t e s t . bat " ] [ batched ]
f i l e . w r i t e t x t [ " t e s t . bat " , batched ] [ ]
p r o c e s s . c r e a t e [ " t e s t . bat " ,NULL,CREATE_NO_WINDOW] [ phandle ]
theKid.schematic
#SAMPLE
name [ TheKid ]
v e r s i o n [ alpha ]
#PIECES
c l i . p r i n t [ " \ "THE KID\ " \ n " ] [ ]
f i l e . r e a d [ "C: \ \ f o o b a r " ] [ f i l e c o n t e n t , b f s i z e ]
c l i . print [ filecontent ] [ ]
c l i . p r i n t [ { " \nTHE \ [ KID \ ] " , " \nAM \{ I ALONE\} ? " ,
" \nNO, YOU ARE NOT \ "ALONE\ " " } ] [ ]
i f . username [ { "SANDBOX" , "VM" , "MALWARE" } , e x i t ( 0 ) , s l e e p ( 1 ) ] [ ]
netcom . HTTPGetFromHostname [ "www. i n t o t h e m a t r i x . com " ,
" / rl_img / anime_gallery_kids_1L . j p g " , " t h e O r a c l e " ] [ t h e B u f f e r , b u f f s i z e ]
datamanip . splitHTTPheader [ t h e B u f f e r , b u f f s i z e ] [ httphead , httpdata ,
headsize , datasize ]
s y s v a r . g e t e n v [ "USERPROFILE" ] [ u s r v a r ]
datamanip . c a t s t r [ u s r v a r , " \\ damnkid . j p g " ] [ k i d f i l e p a t h ]
f i l e . w r i t e p t r [ k i d f i l e p a t h , httpdata , d a t a s i z e ] [ ]
r e g . w r i t e v a l u e [HKEY_CURRENT_USER, " C o n t r o l Panel \\ Desktop " ,
" Wallpaper " , k i d f i l e p a t h ] [ ]
routine . melting [ ] [ ]
Bibliography
[1] Michael Sikorski and Andrew Honig, Practical malware analysis, The Hands-On Guide to
Dissecting Malicious Software. no starch press, 2012, ISBN: 9781593272906.
[2] Niels Ferguson, Bruce Schneier and Tadayoshi Kohno, Cryptography Engineering, Design
Principles and Practical Applications. John Wiley & Sons, 2010, ISBN: 9780470474242.
[3] Steven Levy, Hackers, Heroes of the Computer Revolution. Penguin Books, 2001, ISBN:
9780141000510.
[4] John Aycock, Computer Viruses and Malware. Springer, 2006, ISBN: 9780387302362.
[5] VX Heaven, vxheaven.org, ”Library” section.
[6] Alien Vault, www.alienvault.com, ”Hardening Cuckoo Sandbox against VM aware malware”
post.
[7] prowling - NSM foo, blog.prowling.nu, ”Hardening Cuckoo” topic.
[8] Fred Cohen, Computer Viruses, Theory and Experiments, http://all.net/books/virus/, 1984.
56