Branche : SRT / SSI Année : 2014 Semestre : A14 Titre du st
Transcription
Branche : SRT / SSI Année : 2014 Semestre : A14 Titre du st
Nom, prénom : BELLISSIME Ferdinand Branche : SRT / SSI Responsable pédagogique UTT : Année : 2014 Semestre : A14 Titre du stage STAGE A LA BUNDESKRIMINALAMT : PARTICIPER A LA CONSTRUCTION DE MALANET, UN SYSTEME AUTOMATISE D'ANALYSE DE MALWARES Résumé C’est dans l’unité KI-42 de la Bundeskriminalamt (police criminelle fédérale allemande) à Berlin que s’est déroulé ce stage. Principalement tournée vers l’investigation numérique réseau, celle-ci fournit également du support dans l’analyse de malwares. C’est dans le contexte actuel du développement de la criminalité informatique et d’une augmentation de l’utilisation de malwares que celle-ci travaille sur un système automatisé d’analyse s’appuyant sur des simulations. Deux sous-projets ont été recherchés et développés dans le cadre de ce stage : le déploiement et l’intégration de machines virtuelles pour compléter le parc de cibles réelles, ainsi que la mise au point d’un générateur de malwares non-offensifs afin de tester le bon fonctionnement et les limites du système. Si les machines virtuelles furent principalement du développement et du déploiement, le générateur demanda un aspect plus théorique pour concevoir un langage intermédiaire entre programmation et description suivi de la mise au point d’un prototype. Au-delà du travail technique, une réflexion de fond sur la criminalité, la technologie et la société a accompagné le stage et a été retranscrite dans le rapport. Entreprise : Bundeskriminalamt Lieu : Berlin, Allemagne Responsable : Herr Thomas Schwarz Mots clefs Recherche appliquée, développement Fonction publique Informatique Sécurité des systèmes Acknowledgements I would like to express my sincere gratitude and apprecitation to the people that provided me with the opportunities and means to advance in life, whether the help was found in social, technical, philosophical, economical or casual matters. Including, but not limited to, the following persons : • the Bundeskriminalamt, for the opportunity to work amongst its people, • the Université de technologie de Troyes, for the opportunity to practice my capacities through internships, • my agency-side supervisor, the Kriminaloberkommissar Thomas Schwarz, for its patience and support, • my university-side supervisor, the Associate Professor Patrick Lallement, for its motivated and thorough work, • the Leitender Kriminaldirektor Helmut Ujen, for offering me the opportunity to work with the BKA in the first place, • the UTT Relations Formation-Entreprises service, for its help and understanding, • the whole KI-42 unit in Berlin, and our lovely neighbour, the KI-22 unit, for the work environment, for tolerating my unusual behaviour and ideas, • the giants on whose shoulders I am standing on, those who worked on the potential of information machines and systems, those who played with self-reproducing agents and parasitic software, those who theorized simulation-based analysis, those who provided virtualization technologies, • the flow of information on the Internet, from the humblest answer to the biggest aggregation of knowledge, for providing tools and means for self-teaching, for acting as a live repository of human knowledge and thoughts, • Thanks for the coffee, Dieter. • Love you Diane. Ouverture Introduction This paper is written in the context of a final internship of both an engineering formation (TN10) and a Master study (TN30) at the UTT, focused on information security (engineering department Systèmes, Réseaux et Télécommunications, specialisation Sécurité des Systèmes et des Communications; master Sciences, Technologies et Santé, mention Sciences et Technologies de l’Information et de la Communication, specialisation Sécurité des Systèmes d’Information). The internship took place at the Bundeskriminalamt office in Berlin, for the KI-42 unit, from September the 1st in 2014 to February the 27th in 2015. It was conducted by the intern Roland Ferdinand (usual name) Loup Bellissime, under the supervision of the Kriminaloberkommissar Thomas Schwarz and the Associate Professor Patrick Lallement. This final paper acts as a presentation for examination purposes for both the research and development done regarding computer crime, simulation-based analysis for malwares, introspectionoriented analysis through test samples, in the context of participating to the continuous evolution of the in-house MALANET experiment. List of abbreviations • AV: Anti-Virus • IT: Information Technology • BKA: Bundeskriminalamt • C&C: Command & Control • KI: Kriminalistisches Institut • DLL: Dynamic Link Library • LKA: Landeskriminalamt • R&D: Research & Development • RE: Reverse-Engineering • UTT: Université de Technologie de Troyes Disclaimer Due to the continuous R&D state of the analysis system, including the sample factory and the homebrew pseudo-language (both of which may turn open-source one day), neither code examples nor design decisions may be up-to-date. Due to the sensitive nature of some operational work, the help provided regarding low-sensitive cases and casual operational work is not described thoroughly in this report. "Man verdirbt einen Jüngling am sichersten, wenn man ihn anleitet, den Gleichdenkenden höher zu achten, als den Andersdenkenden." Friedrich Nietzsche Contents 1 Pardon my French: résumé du rapport 2 The 2.1 2.2 2.3 1 Bundeskriminalamt and the rise of computer crimes 10 BKA: a federal agency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 From handling communications to computer crimes . . . . . . . . . . . . . . . . . . . 10 KI-42: network forensics, wiretap analysis, malwares . . . . . . . . . . . . . . . . . . 11 3 Malware forensics 12 3.1 Know your enemy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Know your options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Know your story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 MALANET: an analysis environment, tools included 18 4.1 Assisting the human with an automated system . . . . . . . . . . . . . . . . . . . . . 18 4.2 Components of an analysis system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 The MALANET solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Analysis environment: deploying virtual targets 22 5.1 Going virtual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Implementing Cuckoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 Virtual targets inside MALANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6 System introspection: crafting 6.1 Analysing an analysis system 6.2 Crafting a sample factory . . 6.3 Sharing the good stuff . . . . a . . . test . . . . . . . . . sample . . . . . . . . . . . . . . . 7 Law and technological (dis)Order 7.1 Computer crime and computer police, a brave 7.2 Ex machina: the assistant in the machine . . 7.3 Viruses, beyond good and evil . . . . . . . . . 7.4 Technological evolution and societal evolution factory 33 . . . . . . . . . . . . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . . . . . . . . . 42 new world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 44 45 45 46 A Extracts from the Cuckoo guide 48 B Cuckoo API and Python : examples 50 C Decoding botnet message : examples 51 D The Architect : examples 53 Bibliography 56 Chapter 1 Pardon my French: résumé du rapport La Bundeskriminalamt et la montée du crime informatique La BKA, une agence federale L’Allemagne, pays fédéral construit autour d’un ensemble de länders, dispose d’une agence de police criminelle à un niveau fédéral, la Bundeskriminalamt (BKA). Tout en servant de relais entre les agences régionales (les LKA) ainsi qu’avec les autres agences nationales ou supranationales, la juridiction de la BKA s’étend en théorie aux crimes commis sur plusieurs régions ou ayant un impact national. Dans les faits, en raison de ses moyens plus conséquents que les LKA, la BKA prend en charge différents types de crimes particuliers : analyse légale de cas exceptionnels, renseignement criminel, contre-terrorisme, et la majorité du crime informatique. Afin de couvrir l’entièreté du territoire allemand, la BKA dispose de trois sièges principaux situes à Wiesbaden, à Berlin ainsi qu’à proximité de Bonn. Pour des raisons historiques, le siège principal se trouve à Wiesbaden. De s’occuper des télécommunications à répondre à la criminalité informatique La BKA débuta ses activités vis-à-vis des technologies de l’information au travers des télécommunications employées au cours de la guerre froide par les agents étrangers sur sol ouest-allemand. L’apparition puis la démocratisation des ordinateurs personnels ont certes amené une évolution dans les méthodes de communication des agents étrangers mais aussi dans la criminalité en général (autant pour faciliter la communication, la logistique, la planification, que pour adapter des crimes au contexte des systèmes informatisés). Si les premiers concernés furent des enthousiastes qui repoussèrent des limites encore mal comprises, la situation aujourd’hui intègre les différents éléments de la criminalité courante ainsi que des cas inédits. Pour répondre à cette évolution, d’un côté les unités d’enquête intégrèrent de nouveaux outils et de nouvelles méthodes afin de s’adapter au changement pratique sur le terrain et dans les éléments d’enquête, tandis que d’un autre côté des unités de soutien spécialisées furent mises en place. Si le soutien passe par l’apport d’une expertise technique aux enquêteurs, comme par la maîtrise d’une technologie particulière (e.g. : écoutes), il passe aussi par des unités tournées vers l’investigation technique et l’analyse légale. Tandis que les premières servent à acquérir plus d’information en s’intégrant directement dans le processus d’enquête, les secondes s’orientent vers la reconstruction des événements ainsi que la recherche de preuves au sein de l’information acquise. 1 CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 2 KI-42 : investigation réseau, analyse d’écoutes, programmes malicieux L’unité KI-42 est issue d’une ancienne unité d’investigation informatique, au même titre que la KI-22, unité tournée vers l’analyse du matériel informatique ainsi que des systèmes. KI-42 obtint l’investigation réseau ainsi que l’analyse de données récupérées lors d’écoutes, un grand nombre de compétences communes existant entre les deux. Concernant l’unité présente à Berlin, celle-ci obtint également l’analyse de programmes malicieux, tâche qui lui était attribuée autrefois, désormais mission supplémentaire. Ce choix d’orientation s’inscrit dans l’évolution récente du crime informatique. Si aujourd’hui le sujet global est devenu plus évident pour tout un chacun, le grand public est plus sensible aux cas d’intrusion, de déni de service ou d’attaque directe. Cependant la présence de programmes malicieux est en pleine augmentation, et la démocratisation d’outils automatisés ainsi que l’apparition de marchés officieux ont favorisé une banalisation des programmes malicieux peu complexes tout en entraînant l’apparition de programmes conçus sur mesure, d’une complexité généralement bien plus élevée. Investigation numérique de programmes malicieux Connais ton ennemi Le terme programme malicieux, récent en comparaison des terminologies telles que virus ou ver, n’est pas directement rattaché à des questions de fonctionnalités techniques : il s’agit ici de l’utilisation possiblement malveillante du-dit programme qui lui donne son nom, et donc, in fine, de l’usage qu’en fait un utilisateur. Il reste plus aisé d’appliquer la notion de malveillance à un être humain qu’à un programme informatique. Il est cependant possible de remarquer que les programmes malicieux disposent d’une gamme de composants présents de façon générale. S’y trouvent les éléments suivants : la charge, ou élément actif du programme malicieux (qui peut contenir des bombes, des sondes d’écoute, des fonctions de manipulation de l’environnement, des interfaces graphiques, etc.) ; le moteur d’auto-réplication qui permet d’effectuer des copies du programme, cela de façon parasite (virus) ou non (vers), se construisant en général autour d’un système de recherche et de filtrage des cibles ; des mesures de discrétion, que ce soit sous la forme de modifications de données (e.g. : cryptographie), de déploiement de données embarquées (e.g. : packing), de modification à la volée lors de l’auto-réplication (e.g. : polymorphisme) ; des routines de défense, que ce soit sous la forme de détection de machines virtuels et de décompilateurs ou d’une défense plus pro-active qui désactivera l’anti-virus local ; un moteur de communications réseaux, qui permettra la propagation via le réseau, l’exfiltration de données ou la réception d’ordres. Un vocabulaire familier existe cependant, désignant les programmes malicieux selon leurs comportements et impacts sur le système local, puisant en général dans un vocabulaire quelque peu imagé. Ces désignations peuvent se répartir en trois grandes catégories. Parmi les systèmes d’acheminement se trouvent les virus (reproduction parasite, orientée mémoire), les vers (reproduction sans parasitisme, orientée réseau) ainsi que les chevaux de Troie (pas d’auto-réplication). Une porte dérobée (permet d’esquiver le système d’authentification et d’accès local), un outil de dissimulation d’activité (se place hors de portée de l’utilisateur et du système local, en général en amont) ainsi qu’un enregistreur de frappes (le concept pouvant s’étendre à d’autres entrées qu’un clavier) forment les charges les plus connues du public. Concernant les applications, qui sans équivoque donnent leur nom aux programmes malicieux, s’y retrouvent les logiciels publicitaires, les logiciels rançons ainsi que les robots (ou ordinateurs zombies, faisant souvent partis d’une structure à part entière, un réseau d’ordinateurs zombies). Un esprit vif remarquera que les fonctions techniques nécessaires pour produire ces programmes malicieux servent aussi des buts plus légalement CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 3 louables (accès distant à un serveur, sécurité des communications, partage de données, propagation d’une mise-a-jour, etc.). Connais tes options La connaissance du public provient certes de la médiatisation récente de la criminalité informatique mais aussi de l’existence de logiciels de protection et de détection des programmes malveillants. Les anti-virus, ainsi que différents anti-programmes et pare-feux, servent dans les faits de vecteur de sensibilisation du public. Cependant, bien que les outils destinés aux antivirus (et assimilés) et à l’investigation légale soient similaires dans la technique, les buts ainsi que certaines méthodes diffèrent grandement. Cela s’explique par la position dans l’apport en sécurité fourni à autrui : les programmes de protection servent à détecter et contrecarrer des programmes malveillants, cherchant donc à distinguer un danger d’un élément banal (principalement à l’aide de signatures, que celles-ci soient statiques ou heuristiques), tandis que l’analyse légale cherche à comprendre les différents éléments et à en extraire de l’information (principalement à l’aide d’éléments d’observation et d’analyse de l’information). Connais ton histoire L’existence de programmes malicieux n’a rien de récent, bien que le crime informatique soit dans les faits plus anciens. Il est cependant notable que les concepts d’auto-réplication, de virus ainsi que de vers (et les terminologies respectives) datent des débuts de l’informatique, bien avant l’emploi de l’adjectif malicieux pour désigner du code. L’auto-réplication précède même les ordinateurs dans leur définition moderne, en se trouvant au coeur de certains travaux de John von Neumann (qui amèneront à formaliser la science des automates cellulaires, ainsi qu’au jeu de la vie de Conway). Fred Cohen, en s’appuyant sur ses travaux d’enthousiaste puis d’universitaire, formalisera l’existence des virus. Celui-ci approchant le sujet avec un esprit ouvert considèrera certes une utilisation nuisible de ce type de programme, mais aussi de possibles apports positifs (des exemples illustrant chaque cas apparaitront dans les années suivantes, bien que, in fine, plus sous la forme de vers que de virus). Cependant, depuis la démocratisation de l’informatique et son inclusion dans les activités criminelles, l’existence de programmes malveillants est indéniable et les cas comme les virus y sont rattachés. Si la forte pression amenée par la lutte anti-virale servit de force évolutionnaire qui amena l’existence de cas particulièrement complexes, la banalisation de certains outils poussa bien plus loin la quantité de cas peu complexes, attribuables à un grand nombre d’acteurs différents. MALANET : un environnement d’analyse, outils compris Assister l’humain avec un système automatisé Les apports des technologies modernes de l’information à la société ont certes profité aux éléments criminels mais aussi aux acteurs légaux. Si des outils de base existent, tels que ceux issus de l’analyse légale, ceux-ci peuvent être surpassés par des systèmes plus complexes, construits en utilisant ces outils comme fondations. L’agrégation de différentes sources d’information et l’analyse comparée du contenu permet une compréhension plus fine des données ainsi qu’un gain de temps sur les tâches pouvant être automatisées, et donc une exploitation potentiellement plus efficace. Afin de répondre aux besoins d’une analyse d’un logiciel malveillant, et fournir une information construite et probante, le système peut bâtir son analyse sur l’utilisation de simulations. Une simulation contrôlée fournit un accès inégalé aux comportements et interactions du logiciel malveillant, dans les limites de la solidité de la simulation. CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 4 Composants d’un système d’analyse Un système d’analyse se construit autour de différents composants qui se doivent de coexister et de coopérer. Le coeur technique est composé des cibles contrôlées pour la simulation, d’outils d’observation (e.g. : sondes) et d’analyse rapide (e.g. : filtres) de l’information, les seconds interprétant l’information récupérée par les premiers dans les cibles. Un système de rapports, récupérant et projetant l’information, ainsi qu’une interface utilisateur permettent de garantir des interactions avec l’utilisateur et l’accès de ce dernier aux résultats de l’analyse par simulation. Afin de couvrir les besoins réguliers de différents utilisateurs, un système propose des services basiques : un système d’archivage, des mécanismes internes, des outils d’administration ainsi qu’une documentation. En profitant de l’existence d’un système audelà de la plateforme de simulation, la construction d’algorithmes extérieurs permet d’amener de l’intelligence dans le système en lui-même, exploitant plus largement son potentiel. En combinant ce type de capacités avec l’existence d’une mémoire, au travers du système d’archivage, un système de corrélation peut être construit autour d’un moteur de création de profils ainsi qu’un moteur d’identification de profils. Finalement le niveau de qualité et d’exhaustivité requis par le processus judiciaire, propre aux forces de l’ordre et à l’analyse légale, peut être atteint au travers de la variété. En proposant un véritable florilège d’outils d’observation et d’analyse ainsi qu’un ensemble de cibles aux conditions variées (réelles ou virtuelles, différents systèmes de virtualisation, différents environnements logiciels, etc.), la variété permet d’accéder à une analyse de plus haute qualité. La solution MALANET En adoptant la solution d’un système d’analyse automatisé reposant sur des simulations comme réponse aux problématiques récentes de la criminalité informatique dans le cas des programmes malicieux, l’unité KI-42 construit progressivement MALANET, une expérience développée en parallèle à leurs attributions quotidiennes. Cependant, notant que la qualité de la simulation repose certes dans la crédibilité de l’implémentation technique, mais aussi dans l’étendue couverte par la simulation (la seule simulation possiblement parfaite de l’univers est une simulation de l’univers au complet, donc l’univers lui-même), KI-42 a cherché à repousser les limites de la simulation. Afin de couvrir plus de situations et ainsi débloquer plus d’informations utiles, MALANET est construit autour d’une simulation de cibles (réelles ou virtuelles) certes mais aussi d’une simulation d’une connexion à l’Internet. Cela permet de leurrer le logiciel malveillant et de lui faire entamer ses échanges réseaux habituels, obtenant ainsi d’autres possibles éléments du système malicieux. Environnement d’analyse : déploiement de cibles virtuelles Passer au virtuel Une machine virtuelle, et plus spécialement les dites bac-a-sable, sont des systèmes contrôlés au sein d’un environnement virtualisé, isolé du système sous-jacent et capable de revenir à un état stable par défaut. Opter pour des machines virtuelles permet de diminuer le coût, autant en moyen qu’en temps, pour la mise en place (captures, clones) comme pour l’opération (utilisation du superviseur). Cela permet d’assurer des analyses construites sur des simulations rapides et à bas coût, bien que cela soit forcément moins performant que l’utilisation de cibles réelles (une machine virtuelle peut être détectée, grâce aux différences de comportement par rapport à un cas réel, comme via un point non-documenté d’un système). Cependant, tout comme une simulation possède des limites, une routine de détection en possède également, et les programmes malicieux peu complexes peuvent être aisément trompés. Les machines virtuelles fournissent donc une réponse pratique à la problématique de la démocratisation de l’accès à des programmes malveillants. CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 5 Afin de choisir les solutions implémentées, les cas existants sur le marche ont été considérés selon plusieurs critères. Des critères forts, dits d’exclusion, ont permis de dresser une liste réduite en s’appuyant sur le coût, la possibilité d’un déploiement local, l’automatisation du système. Afin de trier plus finement les résultats, des critères faibles, dits de sélection, ont été établis : l’accès aux sources, la capacité d’automatisation et d’intégration dans MALANET, la crédibilité de la virtualisation sous-jacente, l’état du développement et la stabilité du produit, le contenu et les formats des rapports, etc. La liste réduite est par conséquent la suivant : Buster Sandbox Analyzer (rejetée car récemment arrêtée définitivement, mais recommandée autrefois) ; Zerowine (construite autour de Wine et sans mise-à-jour récente, ce qui demande des renforts supplémentaires et un possible retravaille de la machine virtuelle en elle-même, néanmoins l’idée d’exploiter Wine est à conserver) ; Minibis (version locale d’anubis produite par le cert.at, prometteuse mais longtemps en stand-by, une nouvelle version devrait être disponible sous peu constituant une option intéressante pour le futur) ; Cuckoo (projet libre porté par une fondation, s’adapte à différentes technologies de virtualisation, rapports disponibles dans différents formats, apprécié de la communauté et disposant déjà d’extensions modulaires). Implémenter Cuckoo Afin de fournir rapidement un nouveau type de cibles dans MALANET et un outil d’analyse basique et rapide, le travail s’est concentre sur l’implémentation de Cuckoo. Cuckoo est un système construit autour d’un ensemble de scripts en Python qui s’appuie sur une DLL d’observation (cuckoomon) transmise avec le programme malveillant à la machine virtuelle, sur une application en navigateur comme interface utilisateur (construite sur Django), ainsi que sur une base de données en arrière-plan pour permettre un archivage facilité. Le système sous-jacent fonctionne sur une base Debian, avec une version récente de Python, une base de données Mongo, un micro-serveur web pour Django à l’aide de nginx et uwsgi. Concernant la virtualisation, deux options ont été retenues : VirtualBox et QEMU-KVM. Les deux disposent de qualités et de défauts : VirtualBox est moins crédible dans sa simulation, légèrement moins rapide dans son exécution, le code est un mélange de libre et de propriétaire, mais QEMU-KVM est plus complexe à déployer et à manipuler, moins aisément automatiquement contrôlé, plus instable dans son intégration avec les plug-ins. La possibilité d’opter pour les deux ne devrait pas être négligée, QEMU étant privilégié pour sa plus grande crédibilité en tant que simulation et l’utilisation de la virtualisation soutenue par le matériel. Un réseau virtuel relie les machines virtuelles et le script principal. Différents systèmes Windows (principalement XP, Vista et 7, en 32 et 64 bits), avec différents environnements logiciels possibles, servent de cibles. Bien que l’outil fourni soit fonctionnel, différents renforcements peuvent être mis en place. Parmi les extensions existantes, plusieurs sont à prendre en considération : les signatures fournies par la communauté (anecdotiques, ajoutant cependant de la valeur à l’analyse automatisée) ; interactions automatisées avec Volatility, un outil d’analyse de mémoires (instable avec QEMU-KVM pour des raisons de formats de capture de la mémoire) ; extension de Tomer Teller permettant des captures de la mémoire selon des événements précis au-lieu d’une simple capture en fin d’exécution ; malwasm de malware.lu qui fournit des capacités de rétro-ingénierie ; zer0m0n la DLL d’observation au niveau kernel. Pour l’instant, la nécessité d’un outil fonctionnel a repoussé l’intégration d’extensions en raison de l’instabilité et de l’incompatibilité (en général avec QEMU-KVM). Une version retravaillée de la DLL d’observation cuckoomon : la DLL se plaçant au milieu pour écouter les demandes du programme malveillant, l’interception et la modification à la volée des informations transmises sont possibles, et donc la manipulation de la perception de l’environnement du programme analysé, CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 6 un prototype permettant de contrecarrer les cas de faible complexité a été mis au point et déployé. L’utilisation de Cuckoo sur des programmes issus de sites de téléchargement a amené à se concentrer sur les communications des réseaux d’ordinateurs zombies. En permettant une connexion à l’Internet, possible sur des cas obtenus dans la nature, cela fournit une occasion de capturer et d’observer les protocoles et méthodes de communication employés par les ordinateurs zombies, constatant la présence d’encodage de l’information et de sécurité des communications (chiffrement, tests d’intégrité, etc.). Une fois le protocole compris, il est possible de décoder automatiquement les communications (la clef de chiffrement pouvant être extraite d’autres éléments de l’analyse ou testée à l’aide de clefs par défaut), ce qui soulève aussi l’utilisation des schémas de protocoles et le décodage automatisé pour identifier les types de programme malveillant, permettant l’inclusion dynamique de faux serveurs de contrôle dans la simulation de connexion à l’Internet. Bien que des cas uniques et précis existent, à de nombreuses reprises il s’agit d’un ensemble de programmes (issu d’un réseau contamine, d’un serveur de production, etc.). Cela a amené le besoin de créer un ensemble de scripts pour manipuler des groupes de programmes malicieux, ce qui ouvre la possibilité d’utiliser d’autres rapports comme source d’information secondaire pour une analyse en cours, permettant de bâtir des systèmes de corrélation (évitant des analyses inutiles dans le cas de programmes identiques au sein d’un groupe, mettant en évidence les points communs possibles entre différents programmes, etc.). Cibles virtuelles à l’intérieur de MALANET Bien qu’étant un outil fonctionnel en lui-même, Cuckoo a été déployé pour être intégré dans MALANET. Cela s’accomplit autant par l’API pour injecter des programmes à analyser (avec les options et arguments correspondants) et extraire les rapports, que par l’intégration des-dits rapports comme source d’information supplémentaire dans l’analyse globale. Cependant l’existence d’un environnement contenant Cuckoo permet de puiser dans le potentiel du système, et non de chaque outil pris individuellement, pour construire des fonctions de gestion d’options inhabituelles (passage de la DLL classique à la version renforcée, changement de système de virtualisation), l’intégration correcte des extensions maisons (gestion de groupes de programmes, décodage des communications d’ordinateurs zombies), l’utilisation des archives comme source d’information secondaire et donc comme mémoire (pour un système de corrélation, construit sur un moteur de profilage et un moteur de comparaison, permettant une analyse automatique plus poussée). Cependant, il ne faut pas se laisser induire en erreur par la facilite d’usage et la rapidité d’exécution de Cuckoo, un utilisateur doit toujours rester critique devant des résultats. Autant pour des questions de qualité des outils d’observation et d’analyse que pour les faiblesses inhérentes aux machines virtuelles. Afin de fournir un outil de qualité supérieure, requis pour des enquêtes et des actions légales, il faut être capable de trouver les failles dans les machines virtuelles tout en essayant de pousser les limites du système. Introspection du système : fabriquer une usine à échantillons d’essai Analyser un système d’analyse Afin d’analyser un système d’analyse, il est non seulement nécessaire de disposer d’une source d’information contrôlée mais aussi d’une approche introspective permettant d’exploiter le système de l’intérieur. Ce qui amène à fabriquer des programmes malveillants de test. Bien que similaires en de nombreux points à des programmes malveillants classiques, CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 7 ils ne disposent pas de charges et sont tournés vers la conscience de l’environnement (pour la détection de machines virtuelles) et des routines défensives (pour augmenter la discrétion). Afin de fournir un outil manipulable par les différents acteurs, tout en couvrant les différentes situations et en s’adaptant aux évolutions du système, il est alors question de la mise au point d’une usine à échantillons plutôt que d’échantillons uniquement. Fabrique une usine à échantillons La première étape est la phase de conception, comprendre le besoin pour construire une solution adaptée. Les difficultés de création d’échantillons résident dans le besoin de familiarité avec un langage de programmation, une connaissance plus profonde des systèmes d’opération ciblés, ainsi qu’une approche intellectuelle parfois inhabituelle. A cela s’ajoute l’intégration automatique de routines fréquemment utilisées pour diminuer le temps investi dans la conception. La fabrique se construit donc autour d’une bibliothèque de fonctions pré-écrites (les .pieces), d’un langage de description et de programmation plus aisément manipulable qu’un langage courant (les .schematics), de scripts permettant d’assembler ces différentes informations pour fournir un exécutable (l’architecte), et d’une documentation plutôt exhaustive pour faciliter l’utilisation de ce nouvel outil. Le pseudo-langage, comme tout langage, nécessite un ensemble de règles afin de rendre l’information compréhensible tout en fournissant différentes fonctions et subtilités. Chaque précision, chaque nouvelle fonction demande l’attribution de marqueurs et donc une diminution de la marge de manoeuvre dans l’utilisation du langage. Il s’agit de trouver un équilibre entre fonctions et marqueurs, ainsi que des astuces pour diminuer le nombre de marqueurs réservés. Le détail du cheminement logique est fourni dans le corps du texte en anglais, celui-ci ayant amené à : placer une instruction, avec arguments d’entrée et de sortie, par ligne de code ; disposer de marqueurs pour distinguer les valeurs neutres, les chaînes de caractères et les instructions en tant que types d’arguments ; construire un système de listes de valeurs comme arguments pour faciliter la répétition d’instructions similaires ; l’import et l’injection de routines écrites dans d’autres fichiers ; l’existence d’alias remplaçant un ensemble d’instructions et acceptant des arguments d’entrée et de sortie ; un marqueur permettant l’ajout de commentaires dans les schémas ; ainsi que quelques mécanismes internes. L’implémentation technique passe par plusieurs points. Les pièces, en C, sont manipulées par l’Architecte, en Python, à l’aide des schémas, en pseudo-langage. Les éléments sont repartis dans une architecture de dossiers (pieces, schemas, resultats), complétés par la mise au point de plusieurs utilitaires (disponibles : générateur de documentation, xOR de valeurs hexadécimales ; en développement : injection automatique de code poubelle, remplacement automatique des noms de variables par des valeurs aléatoires), ainsi que la mise au point des différentes pièces (catégories disponibles : fichier, registre, mutex, processus, variables systèmes, manipulation de données, communications réseaux, interface console, conditions, boucle, direct ; en développement : aléatoire, poubelle, interface graphique), la mise au point des différents alias et routines (disponibles : alias pour les communication via HTTP, routines pour l’auto-délétion du programme malicieux ; en développement : routines pour l’auto-réplication et le déploiement de programmes embarqués), et finalement, la mise au point de différents schémas (imitations de pafish et des communications d’Andromeda, Anderson et Neo pour les tests basiques et l’exploration des limites du système respectivement, theKid pour l’exploration des limites de la fabrique, Smith pour l’étude de la propagation parasite). CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 8 Notons l’utilisation d’un développement principalement découpé en deux phases. Bien que l’exploration des limites du système permette d’enclencher un profond travail d’amélioration du système, le besoin plus urgent d’un élément de test pour valider les fonctions de base prend le pas. Un schéma emblématique est construit par étape, servant de but temporaire. Pour la validation basique, un élément sans conscience de son environnement, Anderson ; pour la recherche des limites et prendre le dessus sur le système, un élément conscient de son environnement, Neo. Cela permet non seulement une gestion des besoins inscrite dans le rythme du code, mais aussi un développement progressif des fonctions du pseudo-langage et de l’usine. Faire tourner le code Bien que la demande puisse paraitre étrange, l’Architecte pourrait être placé en libre sur l’Internet via Github. La considération des gains et des défauts passe principalement par d’un cote l’augmentation des possibilités de futur développement et d’intégration dans d’autres outils, d’un autre cote le danger de placer en libre du code relié à des programmes malicieux. Cependant l’Architecte est une usine et non un programme malicieux directement, qui ne dispose pas de pièces de code correspondant à des charges malicieuses. Le niveau requis pour implémenter ses propres charges malicieuses est suffisant pour fabriquer de toutes pièces un programme malveillant de basse-complexité par soi-même : propager l’Architecte provoque donc une augmentation très faible du potentiel de criminalité informatique, tout en fournissant un outil aux communautés de recherche scientifique, aux acteurs de la sécurité, aux enthousiastes. Concernant des issues plus pragmatiques, aucune information sensible concernant des projets internes liés à MALANET ou KI-42 ne se trouve dans le contenu de l’Architecte, tandis que la question du financement (au travers de la paie du développeur) soulève la question de savoir si l’argent publique investi ne devrait profiter qu’à la BKA ou à un plus grand nombre de personnes. On retiendra cependant l’argument de l’apport aux confrères travaillant dans le domaine de la sécurité, mais aussi dans les différents domaines scientifiques connexes comme les automates cellulaires, la propagation virale, les programmes conscients de leur environnement, et l’éclairage porté sur la différence entre un virus et un programme malveillant. Loi et (des)Ordre technologique Crime informatique et police informatique, le meilleur des mondes Les avancées technologiques apportent à la société humaine en permettant de repousser les limites de l’évolution biologique. Bien que le coût en soit parfois trop élevé, la problématique plus commune est celle des usages néfastes d’une technologie (e.g. : le cas des technologies à double-emploi, civil et militaire). Lors de la démocratisation d’une technologie dans la société, celle-ci devient certes accessible aux forces de l’ordre mais aussi aux éléments criminels. Bien que les technologies modernes de l’information aient vu les premières utilisations malicieuses dans les mains d’enthousiastes cherchant à explorer un nouveau territoire et repousser des limites, elles sont désormais employées par des personnes criminelles en premier lieu (autant pour la communication que l’action directe). C’est en réaction que les forces de l’ordre ont elles aussi adopté une approche plus ouverte vis-avis des technologies de l’information et de la communication, non seulement pour se placer à un niveau similaire à celui des criminels, mais pour renforcer leurs propres capacités via le potentiel de systèmes automatisés. Ex machina : l’assistant dans la machine Tout comme l’automate mécanique s’est mis à remplacer l’être humain dans les activités industrielles, l’automate numérique a commencé à sup- CHAPTER 1. PARDON MY FRENCH: RÉSUMÉ DU RAPPORT 9 porter l’être humain dans différents secteurs. Si les premiers systèmes d’analyse furent relativement statiques et primitifs, ils évoluèrent rapidement pour s’adapter à l’existence d’éléments plus dynamiques. C’est dans ce cadre qu’existe l’analyse via simulation, qui s’avère être une excellente réponse à l’extension du crime informatique via l’utilisation massive de programmes malveillants type virus et chevaux de Troie. Ainsi, le projet MALANET s’inscrit dans la lutte actuelle menée sur deux fronts, contre les nombreux cas de basse-complexité et contre les situations plus délicates. La force du projet provient non seulement des outils mais aussi de l’exploitation du potentiel synergique du système global, ainsi que de l’extension du domaine couvert par la simulation à une connexion Internet simulée. Cependant, entre les défauts et manques de certains outils d’observation et d’analyse et les faiblesses intrinsèques des simulations, deux conclusions s’imposent : il faut rester vigilant et critique face aux résultats, et pour pro-activement construire un système de meilleure qualité, il faut pouvoir analyser le système et en repousser les limites de l’intérieur. Virus, par-delà le bien et le mal Afin de renforcer un système d’analyse face aux capacités de détection d’un environnement simulé de certains programmes malicieux, l’utilisation de programmes malicieux non-offensifs s’inscrit dans une continuité logique. En fournissant un accès à un ensemble de programmes malléables, une analyse introspective précise du système peut être mise en place. Cependant, si la légitimité de programmes malicieux peut être soulevée ici, elle s’inscrit en opposition aux usages illégitimes. S’ouvrir à des considérations sur les usages possibles de ce type de code permet de franchir la barrière de l’appellation malveillante, qui s’inscrit plus dans les choix des acteurs humains que la valeur intrinsèque du code. Il est ainsi possible de replacer les travaux effectués dans le cadre de virus ou autres programmes ambigus dans un contexte scientifique et ingénierial plus large. Sont notables les différents domaines directement connexes (automates cellulaires, étude du parasitisme, etc.), mais aussi les avancées plus innovantes, que ce soit en recherche (concernant la sapiens artificielle, l’évolution darwienne et la conscience de l’environnement) ou en développement (concernant l’existence de code adaptatif ou l’auto-réplication au service de la propagation et synchronisation de données). Evolution de la technologie et évolution de la société Bien que les considérations originelles de ce document portaient sur des questions à priori applicatives et/ou scientifiques, celles-ci ont développe une sensibilité sociale et humaine au contact des questions de l’aspect malicieux d’un programme et, sur un plan plus large, de l’intégration de technologies dans la société. L’environnement des forces de l’ordre et de la criminalité informatique favorise bien entendu la présence de cette réflexion, de par leurs présences et impacts respectifs sur la société. En manipulant des éléments technologiques similaires dans deux cadres opposés, le point de la conséquence de la mise au ban de ces technologies pour des raisons de sûreté (et par l’utilisation du vocabulaire de la malveillance, par défaut, pour des ensembles d’algorithmes) a été soulevé. Le coût, conditionné par la perception du sujet par le public, semble être un manque à gagner d’un point de vue innovation et la diminution de l’existence de certains domaines de R&D, délaissés car mal considérés ou simplement inconnus. Pour y répondre, il ne s’agit alors non pas de considérer la légalité de certaines technologies, question qui entraine la dégradation du potentiel scientifique et technologique ainsi que la propagation de la peur et de l’ignorance, mais les problématiques plus fondamentales qui gênent l’intégration de ces technologies dans la société : autant ces cas spécifiques soulèvent des questions liées à notre niveau de contrôle de programmes informatiques et des peurs liées aux automates, autant de façon globale il s’agit plus de confronter la société dans sa composante humaine, plus difficile et plus complexe que de nuire à l’innovation technologique via le tabou. Chapter 2 The Bundeskriminalamt and the rise of computer crimes "Another one got caught today, it’s all over the papers." - The Mentor 2.1 BKA: a federal agency As a country, Germany is a federation of länders, born from the history of the German regions. Though each state possess a criminal police office (the LKA, Landeskriminalamt), a federal criminal police office exists, the BKA. The Bundeskriminalamt does serve as a relay and focus point between the various LKA but also as an active office in itself, managing cases spanning on multiple states or at a national level. Due to repartition of budget and manpower, the BKA usually handles more complex cases and bigger threats. A close comparison would be the existence of the FBI inside the US law and order system. Due to their capacities and missions, their areas of focus include forensics for special events, criminal intelligence, witness protection, counter-terrorism, and also computer crimes. In order to act as a proper relay between the various LKA and other law enforcement agencies, while also acting on an international level (cooperating with Europol, Interpol, etc.), the BKA has three major offices in Germany. Though for historical and practical reasons during Cold War headquarters are located in Wiesbaden, the BKA is also installed in Berlin and near Bonn. The various departments and units of the BKA are spread over the three places, with most of the units having personal on each site (although each site may have a different focus). 2.2 From handling communications to computer crimes The history of handling IT-related crimes finds its roots in the Cold War, when a part of the BKA was focused on managing foreign agents, mainly from East Germany, on West Germany soil. The nascent telecommunication systems, toughened by the world wars, were massively employed 10 CHAPTER 2. THE BUNDESKRIMINALAMT AND THE RISE OF COMPUTER CRIMES 11 by agents in the field and by local intelligence services. With the evolution of technology and the democratisation of computers, not only did the tools of foreign agents changed but also those of criminals. If the first people concerned by computer crime were computer enthusiasts exploring a new realm, the focus rapidly shifted to regular criminality starting to integrate computers and telecommunications in their designs. This phenomena leads to the actual state of the trendy cybersecurity area, where computer enthusiasts, regular criminals and state agencies found themselves with similar tools in the same playground. Nevertheless, if the world changed and followed technological evolution, both the criminal and the law and order elements of society adapted. To provide up-to-date investigation, police agencies, including the BKA, have to follow two parallel paths in the integration of IT in police work. The first point of focus is to adapt investigation in order to be able to answer properly to computer crimes or casual integration of IT in criminal works, which is mainly about furnishing investigators with technical knowledge and tailored tools. The second point of focus is not found in the direct investigation part but in the support of those investigation units. The complexity and the diversity of technology ask for specialised units providing help and expertise to investigation units, in the same way experts are available for ballistic, chemical or psychological questions. This support splits itself in two different kinds. On one hand, active and direct technical support can be found, to provide more complex tools and expert use to investigators. As an example, it includes the wiretapping operations. On the other hand, a more analytical and intelligent support can be provided, as a way of providing computer/IT forensics to the investigators. Where the first one is more about having a direct action in the investigation to provide more information (in order to collect fresh information, in order to keep an eye on a criminal element, etc.), the second one is more about reconstructing events and finding proofs in a collection of information. 2.3 KI-42: network forensics, wiretap analysis, malwares In the case of the BKA, those various areas of action and expertise (computer crime investigation, wiretapping, computer forensics, etc.) are divided into various units. The KI-42, part of the forensics institute (one of the nine departments), is one of them, mainly oriented towards network forensics and data analysis. Regarding the precise unit in Berlin, they also extended their work to malware forensics due to their history (the split of an older computer forensics unit gave birth to this unit and the more hardware-oriented/system-oriented forensics unit KI-22, afterwards it was completed with some wiretapping and also got back the malware side as a supplementary mission). Though malware was not their original focus, the integration of this area in computer crime forensics is a wise move. If the democratisation of computers and telecommunications changed regular crime by adding cellphones, smartphones, mails and so forth, in the equation, the criminal use of technology goes further. If there was some ambiguities in the past years, nowadays more and more people are sensitive to this topic: who hasn’t heard of some penetration inside government databases or some corruption of company infrastructures. If the regular, witty, hacking has grown into a range of penetration and exploiting tools, an other kind of technological approach also exists: malwares. And this trend is also growing. Chapter 3 Malware forensics "If you know others and know yourself, you will not be imperilled in a hundred battles." - Sun-Tzu Malware, standing for malicious software, covers a variety of applications. It is used indifferently to describe various IT components, from a single program to a combination of executables or an entire system. The malicious designation is linked to the potential of those elements due to their inherent functionalities and capacities. In fine, it will be the behaviour and the use of those elements that will categorized them as malware. In order to clear the air, approaching malware as an object of scientific study and classification is a first step. Providing the basic possible components and a practical taxonomy, detailing the commonly used vocabulary, will provide the necessary knowledge. Although this is needed, more and more users are aware of the existence of somewhat dangerous software. Mainly through the market of anti-virus and security software, or an unfortunate mail attachment opening, which raises the question of the differences between the way of an anti-virus analysis and the way of a forensicsoriented analysis of malwares. A more general view of the subject can merge the various details in an logical story that will provide a background for today considerations about malicious software. 3.1 Know your enemy Although malwares are various in nature and grouped together because of their potential and behaviours, these criteria still lies on top of an ensemble of common functionalities. Though each functionality may not be present in each malware sample, they are painting a picture of what can be a malware and what it can achieve or be used for. Those capacities are usually combined to produce specific kinds of malware, that will usually get a denomination from the security community and/or the general public. A basic taxonomy of those various common kinds of malwares will complete the theoretical approach of the common functionalities. 12 CHAPTER 3. MALWARE FORENSICS 13 Malware main components The main component of a malware, that justifies the name, is the payload. The payload is the main unwanted functionality of the malware, its goal on the local system. A malware acts as a wrapper for the payload, which can be malicious and can be offensive, both to some extent. A payload may be a so-called bomb (logic bombs that will crash the local system, fork bombs that will full a local memory, etc.), hooks into specific processes or drivers (e.g.: a keylogger will probably hook at some point in the keyboard processing), modification of local data (to modify security measures for example), a graphical interface (to display unwanted messages for example), and so forth. Other components may or may not actually exist in a particular sample, but can be expected. A self-replication engine will be present in viruses and similar malwares, in order to propagate copies of the malicious piece of code. The self-replication may be parasitic, by adding the code to other executables, or not, by copying a stand-alone file (that can be kickstarted in different ways: alternate data stream, debugger options, deceptive appearance, etc.). Functionalities that can be commonly found in a self-replication engine include a crawler algorithm (whether it is to go through a file system or a local network) and a targeting system (which may filter potential targets according to local system variables, previous infection, etc.). Various stealth measures can be implemented, to optimise the discretion of the malware, augmenting its chance of survival. Stealth, or enhanced discretion, can takes several forms: it can be through data manipulation (like encryption), through managing embedding data (like packing and unpacking), through injecting code into live processes, through manipulating its code before self-replication (like polymorphism or metamorphism). Similarly, to increase its chance of survival in different ways, more defensive routines may be implemented. Protection can be achieved through environment-awareness to detect the use of an debugger (and in consequence limit reverseengineering of the sample) or the use of a virtual machine (and in consequence limit behavioural analysis). It can also be achieved through more pro-active or offensive defence (through various timer system to break the flow of analysis and outlast an anti-virus, through a fake payload as an answer to a positive detection, through ways of damaging or shutting down the anti-virus or the RE software). Finally, though it may be rarer for some families of malware (like stand-alone samples or melting samples), a network communication engine may be found. The ability to interact with a local network connection, with or without access to the Internet, can provide support for the payload or for the malware itself. The network can be used in order to propagate outside of the local station, to exfiltrate data (whether stolen from files or monitored with hooks), to provide the attacker with a remote access (whether it is some sort of distant shell or just an algorithm answering orders), and also to download updates for the local malware or new pieces of malicious code. CHAPTER 3. MALWARE FORENSICS 14 Malware basic taxonomy As stated before, the term malware does cover elements of various nature, from a particular functionality to a complex system. This description is due to the use made of the code, and the commonly expected underlying functions linked to those behaviours provide a better understanding of what can be called a malware. Further than this only term, a whole vocabulary, usually based on a particular characteristic of the sample (sometimes in a poetic way), has emerged in the past years. In order to further clarify the situation, a small taxonomy built on this vocabulary will provide a more structured understanding of the subject. Note that, in the same way the exact definition of malware is delicate, no precise and fixed generally-accepted definition of the vocabulary exists, although a commonly agreed upon definition can be crafted from the various understandings. Delivery system • A virus is a self-reproducing software. It may use different vectors of propagation, infect different kind of elements. It usually will hook itself into an other component or program to be kickstarted. • A worm is similar to a virus, being also a self-reproducing software. The main differences are in the propagation behaviour (a worm is more focused on network spreading, hence the name, where a virus is usually more focused on memory drives) and the parasitic aspect (while a virus is usually a parasitic program, a worm has more chance to be a stand-alone component). • A Trojan horse will also act as a way to infect a machine, using exploits similar to those of viruses and worms, but it will not try to self-reproduce (stand-alone). Payload • A backdoor is the bypass of the local authentication and right attribution system. It is usually linked to a unwanted distant access though it can be an offline backdoor (to bypass login or escalate privileges) or a mathematical backdoor (to solve the mathematical equations of a cryptosystem without knowing the key used during the encryption1 ). • A rootkit is a software or set of software that tries to bypass detection by the user and the station by placing itself beyond the reach of the local system, usually before the system itself. While increasing discretion, it usually also provides a deeper control and monitoring of the local station. • A keylogger is a piece of code listening to the user inputs, usually hooking at least into the keyboard2 . Aimed at monitoring or data stealing, the keylogger part is usually the payload of an other malware (though, it can be directly manually installed). The information collected can be locally saved and/or pushed over the network. 1 e.g.: the plausible backdoor in Dual_EC_DRBG. newer versions of monitoring/data stealing can hook into the mouse, the webcam, and so forth 2 Although CHAPTER 3. MALWARE FORENSICS 15 Application • An adware will discreetly take over the local station to inject unwanted advertisement, through control of regular ads display (e.g.: in Internet browsers) or additional graphical interfaces (e.g.: pop-ups). • A ransomware will take over the local station and try to lock out the user or limit its actions, while offering a way to stop this behaviour. Usually the way out will imply paying for it, hence the ransom part of the name. • A bot, short for robot, is a program that will answer commands by reacting with its environment. In computer crime context, a bot will usually be a component of a botnet, or network of zombie computers. Though botnet can be voluntary (e.g.: to provide computing power), more and more infected stations turn out to unwillingly provide computing power (e.g.: to mine Bitcoin), network nodes (e.g.: proxy for an attack) or pawns (e.g.: for your casual DDoS). Studying malware can be tricky due to the relativity of the definition and the fast evolution of malicious code. Nevertheless, in order to produce an analysis of quality, a real understanding is needed, at least to coordinate the various information monitored and inscribe them inside an investigation. Indeed the names are based on technical behaviours, which leads to the casual components behind these behaviours and the whole usual vocabulary and the practical taxonomy. It still raise the question of an intrinsic definition of a malware instead of a potential definition confirmed by the user desires. One should feels the complexity of the situation by noting that the underlying functions are almost all dual-use algorithms (propagation systems can be used for patching or sharing data, crawling systems are a fundamental component of search engines, hooks allow for fine tuning of driver behaviours, cryptography is used for secured communication, remote access for servers, etc.). 3.2 Know your options For the general public, the world of malwares and viruses is first seen through the eye of detection and protection. Which, without all the system-embedded security measures, means anti-virus. A legit question would be to look for the differences between the work of an anti-virus software and the work of a malware forensics agent. After all, for unaware eyes, they both fight against malwares, mainly by trying to recognizing them, and both to provide security to users. But because they pursue different goals, their respective works differ, mainly on three points: the position in the process of assuring security for the society, which leads to different goals, and in consequences to analyse differently information extracted from a malware sample. An anti-virus, and other anti-malware software denominations (anti-rootkit, anti-spyware, etc.), should provide security to a machine, whether it is an embedded device, a local station or a piece of a more complex infrastructure. The aim is protection through proactive security (by enforcing healthy habits like software update, by furnishing a default firewall configuration, etc.) but also through recognition and neutralisation of a threat (the "regular" part of an anti-virus). The AV goal is to distinguish between a benign software and a malicious software, and if possible in front of previously unknown software. CHAPTER 3. MALWARE FORENSICS 16 The recognition process can use various approaches, from primitive but stable ones to more complex but not error-proof ones. The oldest way is to find and target a signature, a static element that will be common to a maximum of variations of a precise malware but fairly rare outside. A more complex way to avoid unwanted results while maximizing the recognition process is to build a profile as a signature, with various variables (including multiple static elements), which would ask for some kind of threat level to be fixed for decision making. Going further down that road, AV built a capacity to analyse unknown software looking for information that could be used for profiling: static code, function calls, meta-information, communication, overall behaviour. Nowadays AVs enhance this capacity by playing on other variables, like the decentralised distribution of positive recognition to other AVs (which means the existence of a fluid database for unknown/recent software3 ). But isn’t malware forensics also about analysing the code, listening to the communication, and observing the overall behaviour ? It is, indeed, and so most of the tools of AV, computer forensics and researches are similar in nature. What is interesting is what is the purpose of the information, a point that can be understand by looking at the position of each actor in the workflow. While an anti-virus tries to protect the user from incoming infection and attacks (attacks if we extend AV to other kind of anti-malware and security software), and plays mainly on the capacity to distinguish danger, the forensic agent is looking to understand the situation and extract information for investigation purposes. The forensic agent does not always enter after the incident (malwares can be found in a case of infection, but they can also come from a source in the wild, from a development server, etc.), but his role is to go further than the malware. To use the malware to fish out, if possible, the author behind it, the criminal using it, the goals of the infection, the position of the act in a bigger plan. So the recognition is more here to be sure about the nature of the sample, the static analysis and the communications are here to provide hints about the people behind the infection, the behaviour will serve as a proof in a possible legal case. Though the basic information is similar, hence the similarities in the tool, the information uses differ, because the roles differ. 3.3 Know your story Nevertheless, one would realise that malwares are nothing new. Though the terminology (malware) is quite recent, even compared to the existence of computers, self-replication and viruses are older terms and concepts. Self-replication actually predates the existence of computers themselves (in their modern definition) and was one of the topic of John von Neumann4 , which will lead to the science of cellular automatons (and, later, to the famous Conway’s game of life). 3 which could lead to an attack based on pushing a lot of false-negative reports to protect a particular piece of code from a particular AV. 4 von Neumann also brought various mathematical theories, one of the first step in quantum thinking, a standardisation of the game theory field, and indeed various innovation in the computing field. CHAPTER 3. MALWARE FORENSICS 17 Viruses, as self-replicating algorithms, emerge in the late ’70s - ’80s, along the first worms and the use of the word worm itself, thanks to cyberpunk literature. The terminology virus is linked to Fred Cohen’s work on parasitic programs (enthusiast work that will lead to a thesis on the subject). At that point, though most of the viruses and similar programs were mostly practical pranks (yet sometimes offensive), the concept of malicious software was not associated to them. Fred Cohen himself viewed viruses as neutral, with the potential to be good infectious program. The first worms were done for computer management reasons (e.g.: shutting down unused machines) or IT experiments (e.g.: checking how wide was the Internet5 ), and since the 80’s a handful of worms have been crafted to hunt other worms and clean after them (going as far as to apply Windows patches). But the situation did change in favour of more intrusive then more offensive uses of these kinds of technology. Up until today, where regular crimes are managed through IT (inside communication, planification, logistics, etc.) and some crimes are taking new forms (data stealing, data corruption, system termination, intrusion, etc.). Though a community of borderline code enthusiasts still exist, with delinquents and criminals using the same tools and similar techniques, it is harder to distinguish one from the other. Plus, though it also served society and science, the propagation and democratisation of technical knowledge and turnkey tools initiated a continued augmentation in the quantity of low-complexity malwares6 and the apparition of highly-tailored samples7 . Handling at the same time the complex and rare precisely-designed malwares while managing the rise of low-complexity cases and casual uses is one of the new challenge of the law and order agencies in the field of computer crime. Not only in order to adapt to the level of technology used but also to manage the variety and quantity of cases, a focus on IT provides the now necessary howto and systems need by investigation and support units. 5 Which did not turned as well as expected, cf Morris worm. could speculate that the rise will stabilise sooner than later. 7 One could speculate that the ones made by governmental agencies do not count as a consequence. 6 One Chapter 4 MALANET: an analysis environment, tools included "A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it." - Dune 4.1 Assisting the human with an automated system Information technology brought potential to society, including better telecommunications and smarter tools, but also dual-use technologies that ended in criminal activities. Nevertheless, the same way IT was used to update some malicious behaviours, it is also used to update management of and answer to these behaviours. Basic tools, like monitoring tools for forensics, are superseded by complex systems bringing various tools together and integrating routine manipulations. From these various sources of information the systems are processing helpful outputs, providing some automated intelligence to the users. Some casual examples are information aggregation, information parsing, information correlation, profile building. Considering this point, constructing a system around usual forensics tools to answer the augmentation of computer crime cases (and especially the diffusion of low-complexity malwares) seems like a strategic move. A system is both a collection of tools and an adapted environment, including all the needed pieces to provide the whole service, in this case analysis. The conception and the implementation of the system will determine the quality of the system, which can be seen through its thoroughness, processing speed, adaptability and stability. 18 CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED 4.2 19 Components of an analysis system An analysis can be provided from different kinds of data: one could think of post-mortem forensics and its static analysis. In the case of malwares, they are living software and could be kickstarted in a controlled environment. Building the environment around a simulation allows to extract information not only from static data but also from live behaviours. This would provide a better understanding of the piece of code, with a variety of insights (direct information sources going from static strings to network communications, without considering post-treatment and the intelligence in the system itself). The technical core So, if an analysis system is built in direction of deploying forensics tools, it starts with the working environment. An analysis based on simulation asks for managed targets as a major component of the system. The targets allow a simulation process, the next step is the capacity to keep an eye on the simulation and extract information. Two complementary kinds of tools become the next component of the system: monitoring tools and analytical tools. The monitoring ones will provide access to the information, from grabbing and parsing the code to hooking into live processes, and the analytical tools will provide some intelligence in the management of the information, from selecting particular information in the pool of data to building timelines and profiles. Interactions with the user Once the technical layer to run, monitor and analyse samples is done, the next steps are the system itself and its interaction with the user. Though the information has been collected and parsed, a component is needed to format and distribute the information to the user, the reporting system. Producing any kind of documents or structured data allows the human user to enjoy the work of the system. Further than that, the user needs to access these reports but also to manage and manipulate the system, hence the existence of a user interface to complete the inside mechanisms. User services After the technical core and the interactions with the user, the next components of the system provide services to use and manage the system on the long-run. An archiving system would provide a repository for information, that would be used by the user (to safely stock a report for future retrieval) and the system (as an auxiliary source of information, a memory so to speak). Inside mechanisms and administration tools are services provided to a special kind of users, those taking care of the system (it can be ways to restore corrupted targets, clean the archive, safely restart the system, etc.). Finally a documentation, though it is not a modern technology it is still IT (pen-and-paper IT), is needed to provide the different users with understanding of the system and basic guidelines. CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED 20 System intelligence Nevertheless, designing a system based on those components provide a working solution without exploiting the full potential of having automated processes, information pre- and post-treatment and some kind of memory (the archives). By crafting supplementary and complementary processes, some kind of intelligence can be found not only in the forensics tools but in the system itself. This idea is the next step in the crafting process, trying to use the potential of information manipulation automation while playing on the synergy between the different components and the environment. The main stone is the existence of the archives, providing older analyses as references for the actual analysis. This allows for the integration of a correlation system, which would first need a profiling engine (profiles being extracted from the analysis reports) and then a correlation engine (that would wisely parse and select information inside the profiles/reports and follows various rules of comparison, mainly equality and similarity). Indeed, others post-treatments could be implemented but the ability to see resemblances and to find links between various elements is one of the cornerstone of the investigation process (with the understanding of each elements and the capacity to see the big picture). Quality through various combinations Official forensics work that may be used for law and order actions requires a high level of quality and thoroughness. These criteria, one of the biggest difference between an enthusiast/AV analysis of malware and malware forensics, indeed impact the design of the system. To provide the best analysis possible according to these criteria, the exploitation of various sources of information, the existence of the corresponding various forensics tools and the deployment of various targets with various environment are a huge part of the solution. Though mentioned before, those various combinations are mainly composed from: the combination of different approaches that will be built on various sources of information (static code analysis, automated reverse-engineering, target system analysis, network communication analysis, live behaviours, etc.) and the matching monitoring and analytical tools (to manage the various sources of information and extract as much information as possible while being able to understand it and parse it), and the existence of various kinds of targets with different local systems and different hardware and software environments in order to provide a more exhaustive collection of simulations. 4.3 The MALANET solution At the BKA, the KI-42 unit in Berlin, in order to adapt to the evolution of computer crime, has a sideline experiment to build and use an automated analysis system: MALANET. This system, in continuous R&D, provides the services described before following similar designs (regarding to the components, the question of thoroughness and quality, etc.). The implementation is done through various regular and in-house technologies, although the inside details about the implementation are limited due to the sensitive nature of this information. CHAPTER 4. MALANET: AN ANALYSIS ENVIRONMENT, TOOLS INCLUDED 21 One interesting point that was considered during MALANET design and that gives him an edge in some parts of the analysis is the limit of the simulation. The quality of a simulation is indeed linked to the intrinsic quality of each simulation component (e.g.: the virtualization system for the virtual targets) but also to the extent of the simulation. If one would ask what is the best simulation of the universe, the answer is actually the universe itself. But, hopefully for the law and order, there is no need to go this far for malware forensics. Yet, MALANET goes further than the simulation of targets by considering a simulation of the Internet. The main components of a MALANET-like system The rise of malicious software using network communication in their regular behaviour (in order to leak data, to propagate orders inside a botnet, to provide distant access to a malicious actor, to act as a proxy or a repository for an other malware, etc.) and the objectives of a law and order forensics (oriented in finding other pieces of a malicious system, including other corrupted machines and the human actors of the system, rather than just estimating the casualties on the local system) place network communication at the centre of the malware forensic. To do so, the crafting of a lookalike environment with a simulated connection to the Internet, providing with as much fidelity as possible all kind of communication, would allow the malware to communicate (thus providing a better source of information for the forensics process) while trying to trick the possible system-awareness and anti-virtualization/anti-RE measures in the malicious software. Chapter 5 Analysis environment: deploying virtual targets "Now I do not know whether it was then I dreamt I was a butterfly, or whether I am now a butterfly dreaming I am a man." - Zhuangzi 5.1 Going virtual Sandboxes: opting for virtual targets Sandboxes are systems running inside a virtual environment, elements that can be managed from the outside by the simulation supervisor. Being partitioned from the underlying local system, they provide an isolated running environment (the quality of this isolation being one of the main criteria for sandboxes evaluation). Due to this property, they excel in providing development environments, test environments and also simulation environments. The anti-virus community rapidly exploited sandboxes to provide runtime behavioural analysis, limiting the impact of various anti-AV functions in malware (like metamorphic/polymorphic engines allowing automated modification of the executable when self-replicating, counteracting signature-based detection). In the case of a more forensics-oriented analysis, they provide the same access to an information source (live behaviour) as regular targets do. While at the same time being more easily deployed, customized, restored, all thanks to the supervision level of the virtualization environment. Which also allows for an easier automation of the system, in both cases of sending data (commands, files, etc.) and of retrieving the monitored information. And there is also the obvious advantages of virtualization, like cost limitations, resource sharing, fast deployment and cloning process, managed snapshots, and so forth. Nevertheless, sandboxes are no panacea. As said before the quality of a sandbox depends mainly of the isolation from the local system but also of the likeliness of the simulated system. The resemblance between a virtualised system and a real system only goes so far, a perfect simulation has yet to be achieved for various reasons. Including undocumented behaviours of processors, bugs linked to a particular driver or component, default state of some stacks at some precise moment, and so forth. Plus the presence of the sandbox may be advertised by leaking data, specialized 22 CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 23 drivers, dedicated behaviours from the os, etc. These anomalies are potential indicators of a virtual environment, hence the existence of environment-aware malware. The good thing is that, if the authenticity and the credibility of a virtual system have limitations, so does the detection measures and the environment-awareness1 . Though hardening the virtualization system may be needed to avoid some basic anomalies. Even with those defaults taken into account, the deployment and use of virtual targets inside MALANET will still provide a useful service. Low-complexity malware will not be environment-aware or will easily be fooled, and with the rise of low-complexity cases (mainly due to the democratisation of some code and to the rise of malware markets, selling samples, crafters, packers, botnet accesses, etc.), virtual targets may provide a faster yet efficient enough way of automated analysis. Plus the existence of a supervisor and the possibility, to some extent, of virtualised hardware allows for an easier automation, integration into a bigger system, manipulation of the environment, and so forth. Sandboxes: selecting the solutions The existing solutions can be categorized using different criteria. Some sandboxes are meant for manual study (as in regular reverse engineering) and others for automated study. In this case, the efforts are focused on finding an already automated analysis solution. Apart from that, the market can also be divided according to two criteria: whether the solution is a web service or a local service, and whether the solution is free of charge or not. Regarding the cost, we should favour free of charge solutions for obvious reasons. Regarding the type of solution, we should avoid web services for two reasons. First, the malware may leak data about the original target and, being sometimes highly tailored, it may spread out of the web service : because of the sensitive and sometimes technically-complex nature of computer crime policework, there is a trust issue regarding the confidentiality and security of online sandboxes. Second, we lack info on the workflow of web services and control over its functions: we can not afford using a black-box solution nor loosing the possibility to tune it and hardened it. That is why we should look for a local solution. Knowing this, the number of legitimate solutions is quite limited. Though many web services exist (like anubis or comodo instant malware analysis), only a few open-source/free solutions are available. One of the most well-known solution is Cuckoo, but there is also Buster Sandbox Analyzer, Zerowine and the Minibis solution. The capacity of running as a local service and the cost of it are strong criteria, meant for exclusion in the choice process. To be able to analyse solutions in the lasting group, the selection criteria are the following: access to the source, targeted systems, automation capacity, easiness to be detected as a sandbox, accuracy of its results, hardening possibilities, latest release and development activity, and content and format of the results. Buster Sandbox Analyzer BSA is a one-man project who started in 2009 as an extension to Sandboxie. Aimed at analyzing Windows malwares (32 and 64 bits), it can run different file formats (exe, pdf, bat, url, etc.) and can be launched from the command line (making automation easier to craft). Reports are written in an human-friendly way, categorized according to the type of changes (registry, network, etc.). Nevertheless, BSA has a few defaults. Sandboxie, the main dependency, is a licensed software with an annual fee. Nor Sandboxie nor BSA are open-sources 1 Yet an other case of a cat and mouse game in security. CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 24 (giving less understanding and control over the process). And, since the last version of Sandboxie and its acquisition by an other company, BSA is discontinued (since 2013). This, alas, clearly rules out BSA as a serious long-term solution. Zerowine Zerowine is an automated analysis sandbox based on Wine. It is delivered as source code or as a compiled QEMU image. It shows possibility of automation (Python being used to manage it) and does produce human-friendly reports with subsections (strings, headers, signatures). The latest version offered an hardened version of Wine and Zerowine to further avoid detection and augment unpacking capacities. One point should be take into account, Zerowine isn’t a real sandbox in itself, more a controlled emulation environment. Having to manage detection of Wine, detection of the monitoring tools and the underlying sandbox (deployed to turn it into a proper virtual target) make this solution a little bit more fragile, at least out of the box. An other drawback is the lack of updates for years (last release is around 2010-2011). The main developer is active (through his site, and blog, about malware analysis2 ) but not on this project any more. A variation, Zerowine Tryouts, was made by a Korean researcher to patch bugs and add some functionalities but hasn’t also been updated in years. Yet it may be worth to test the solution (the vanilla one or the Tryouts one) and maybe restart development/produce an extension. Minibis Minibis is a solution offered by cert.at, the Austrian national CERT. It is a suite of different tools and scripts provided for malware analysis with automation capacities. It started as a local service inspired by Anubis to provide more control over the process and the reports. It seemed like an early promising local solution of automated malware analysis, before Cuckoo and others. The available version has not been updated since 2011 but cert.at made last year (2014-08-08) a Twitter announcement of a new version of Minibis (v3). Though using the old version of Minibis may not provide an edge compared to an updated Zerowine, one should keep an eye on the next release of Minibis (and maybe contact the cert.at for more information and an early version). Cuckoo Cuckoo is a recent but well-known sandbox made for automated malware analysis. Started as a one-man project for the 2010 Google Summer of Code, it is now supported by a team of developers and backed by a foundation (a sign of stability and futur releases). An opensource project, mainly in Python, it offers a supervisor tool and analytical scripts. Though the solution is made to manage a sandbox, it is technologically independent and can be used on top of various virtualization solutions (VMWare, VirtualBox, QEMU-KVM, etc.). It targets Windows malware analysis while running on a Linux platform. The results are provided according to various categories (static analysis, network communication, dumped files, etc.), while the report can be encoded in various formats (JSON reports that will be archived in the backend database, HTML versions, MAEC profiles3 ). Its popularity in the information security and virus enthusiasts world not only provides accessible documentation but also a few extensions and reworks of it. As an example, not only did the Volatility team provide a plug-in to easier the interaction between Cuckoo and their memory analysis system, but Tomer Teller went further by working on a system of dynamic memory analysis which uses triggers (API calls, heavy mathematical functions,etc.) to capture memory dump (as opposed to an end-of-run memory dump). 2 joxeankoret.com 3 Malware Attribute Enumeration and Characterization, or MAEC, a structured representation language to characterize malwares. CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 25 Nevertheless, though three sandboxes are seen as potential solution, each needing a different amount of work for implementation, other possibilities for virtual targets inside MALANET exist. Exclusion criteria could be rework to consider paying solutions, though this may be out of the question for budget reasons. Further than that, an in-house development of a Wine solution (or a similar Windows emulation solution) may provide an alternate solution. In any case, for the analysis solution needed at least one of them to illustrate the use of virtual targets, the focus was temporary put onto Cuckoo (although most of the work can be adapted to other solutions). 5.2 Implementing Cuckoo In order to integrate Cuckoo inside MALANET and provide additional virtual targets to the analysis system, the sandboxing solution must first be looked at in order to understand the various pieces and deploy a correct working environment. Once the pieces are working together in a designed environment, further steps can be taken in order to exploit its potential and strengthen it (which will include a study of the existing plug-ins, an in-house reinforced version of some components, and supplementary scripts made when confronted to wild samples). .−−−−−−−−−−−−−−−−−. | Cuckoo Sandbox ? | | OH NOES! | \ \ ’ −.__. − ’ ’−−−−−−−−−−−−−−−−−’ \\ / oo |−−.−−,−−,−−. \\_. − ’ . _i__i__i_ . ’ " " " " " " " " " Part of the Cuckoo official ASCII art Cuckoo’s components and environment Cuckoo is a collection of Python scripts working together to manage a virtualization system, start and process an automated malware analysis, collect and format the results, plus a few utilities. The internal algorithm has been constructed so each big step (the machinery management, the processing of the analysis and the analysis data, the reporting, etc.) can be configured and completed with new modules. Configuration files will mainly provide the practical information for the virtual machines and the optional behaviours and variables of Cuckoo (analysis timeout, reschedule of failed process, etc.). This main script works with the help of various components, including: a monitoring DLL (cuckoomon.dll) that will be pushed inside the virtual target with the sample (the DLL is used to hook into Windows processes, monitoring the sample and collecting the data) ; a Django web application as a user-interface ; a backend database to manage previous analysis and archives. With all these components, the Cuckoo system is at the same time having a few precise dependencies and asking for some particular environment. The host server for both the analysis system and the virtualization system is a Debian-based system, deployed with a recent version of Python and a Mongo database as the backend storage database (mainly for the JSON-formatted reports). The Django-based web application is used in conjunction with a solid webstack made from nginx CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 26 and uwsgi. Various supervising deamons and services are managing the proper functioning of every components. Regarding the virtualization system, two choices were kept, VirtualBox and QEMUKVM. In any case, a virtual network is connecting the host with the guests in order to collect the monitored information, the network being run by the regular virtual network utilities of the chosen virtualization systems. Cuckoo’s components workflow Albeit the fact that various virtualization systems exist, VirtualBox and QEMU-KVM are ones of the few that match the criteria used to shortlist the available sandboxing solutions. Though VMware is known for its stability and capacities, it quite fast enters a commercial logic that should be avoided. Yet, VirtualBox and QEMU-KVM still provide the basic needs of a virtualization system for Cuckoo while having different profiles (which, in consequence, makes deploying both of them a strength). VirtualBox is a mix of open and close components while QEMU-KVM is fully open-source, or almost. The first one is a casual yet robust and portable virtualization solution while the other one is a solution using the potential of hardware-helped virtualization, asking for specific processors and kernels. Studies and experiments made by security enthusiasts revealed that in recent versions, VirtualBox was easily detected due to a huge number of leaked data and glitches in the guests, while QEMU-KVM allows a close-copy of a real system. VirtualBox can be easily managed to push faked hardware profile and information while QEMU-KVM, even if offering similar possibilities, is more sensitive (asking for a more precise configuration). Both are offering pros and cons, and a wise solution is, if possible, to implement both (while favouring one of them for default analysis and keeping the other one as a supplementary analysis, in order to limit the complexity of using the system for a regular user). Each deployed virtualization system will provide various targets running under different systems and/or with different software environments. A variety of systems is needed, for malwares target various systems and each version of Windows reacts differently, a variety of software is needed, for different malware may target different software suite. Indeed, a basic bundle of software is installed CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 27 on every target to provide a working and credible environment (an office suite, a pdf reader, a browser, Flash plug-ins, etc.). Though all the targets are running some version of Windows, the main targets are under Windows XP (which still has a huge market share, mainly in low-income countries and inside old automatons), Windows Vista (which is still in use and has its own deficiencies and particularities, even with a similar kernel to other Windows versions) and Windows Seven (providing registry and folder architectures differences from older Windows versions). Indeed, in the list of priorities, Windows 8 is next, while server versions of Windows should also be considered. Both 32- and 64-bits are represented, which may be useful even for 32-bits only malwares4 . Strengthening and extending Cuckoo’s Once Cuckoo is properly implemented, with a functioning environment at its disposition, comes the question of seeing where the system can be extended and reinforced. Though the tool is nicely done and accomplish its purpose, it does not provide an extensive post-treatment of information, nor counter-measures against system-aware malwares, and so forth. Building on the knowledge acquired through testing Cuckoo on wild samples from some parts of the Internet, a few points became priorities in order to provide a more efficient and responsive Cuckoo system for MALANET (note that, though similar thoughts would be thought for other sandboxing solutions, the implementations may differ widely). These points include: properly considering existing Cuckoo’s plug-ins, strengthening the counter-detection measures through an in-house version of the monitoring DLL, decoding canonical botnet communication, managing a batch of samples. Cuckoo’s plug-ins The modular, (almost) fully configurable construction of Cuckoo allows for easy implementation of plug-ins and complementary algorithms (whether they are for the monitoring part, analysis part, report part, all of the above). And, although Cuckoo is quite recent as a stable tool, its popularity leads to the apparition of a few interesting plug-ins. A handful of basic plug-ins are distributed by the community and can be downloaded using the community utility (most of today’s plug-ins are signatures related to particular malicious behaviours: checking VBox registry keys, particular Botnet sample, etc.). Other more developed extensions are made by actors of the security scene, especially malware researchers and enthusiasts. Including, but not limited to: an extension to easier the interaction with Volatility (should work with VirtualBox, will have memory dump format issues with QEMU-KVM although future releases of Volatility and KVM should fix that), a trigger-based memory dump system by Tomer Teller instead of the end-of-analysis memory dump (allowing to capture temporary artifacts in memory and to build a better timeline of the events, may not be stable enough with VirtualBox, is not stable enough with QEMU-KVM), malwasm the reverse-engineering plug-in based on pintools brought by the malware.lu people, zer0m0n a kernel-level monitoring DLL provided by conix security. In-house version of cuckoomon DLL The monitoring DLL, cuckoomon, allows Cuckoo to collect information from hooks into Windows functions. Though the system may be bypassed by some malwares, it can be considered as a reliable and resourceful source of information. Because malwares will use system information to check for the presence of a debugger or a virtual environment, the existence of hooks provides an other opportunity : to modify on-the-fly system information given 4 e.g.: 32-bits executables run on a 64-bits Windows will have a slightly different flow, with different processes, and this was already used to produce privilege escalation. CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 28 to the malware to hide the presence of the virtual environment. The monitoring DLL can also act as a informational man-in-the-middle, manipulating the calls caught in the monitoring process. Cuckoomon hooking system To implement this idea into the local Cuckoo system, a second monitoring DLL has been crafted5 and placed in the architecture. This version not only logs the information caught by the various hooks, but also checks the data for particular strings (replacing them on-the-fly with inoffensive strings, e.g. VBox by Acer, or crafting a distinct answer for the sample request, e.g. an error answer for obvious virtualization-related registry keys). This interception of the data transfered to the sample allows for a stronger simulation by hiding certain markers from environment-aware detection measures. It does not provide the same level of environmental data manipulation as a rootkit-like monitoring would provide (kernel-level and so forth), but will counteract easily basic detection routines found in low-complexity malwares. This counter-measure has been built as a proof-of-concept by observing other uses of the monitoring DLL to feed false data to the sample to avoid detection. Going further than denying the existence of some registry keys by allowing manipulation of intercepted information (man-inthe-middle indeed) opens a world of possibilities, while always knowing it will not counter every detection possibilities. The intercept, check and modify behaviour could be extended to the code of the hooks themselves, allowing for a global censor of some keywords in the interactions with the system (virtualization-linked keywords for example), and could be fed lists of sensitive keywords (with default answers, profiles of answers matching brands of hardware, etc.). Decoding botnet communications Samples released in the wild are less prone to raise suspicion if starting to communicate with their servers. So, for some of the tests of Cuckoo, parts of the sample communication were forwarded into the Internet in order to precisely identify the malware and study the options for botnet communication analysis. For the same reasons regular applications are using protocols and normalized formating, botnets are hijacking regular protocols (like IRC ou p2p file sharing algorithms) or crafting homebrew protocols. A homebrew protocol allows for the insertion of the protocol inside a less malicious-looking canal, a casual example being botnet communications using HTTP Get and Post requests. The use of those elements like a 5 The C code of the DLL being a Github project in itself, crafting a variation is child-play CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 29 regular HTTP connection provides some discretion for the unaware eyes while the data formated and transmitted can be manipulated according to the needs and goals of the bot. More and more, these communications have been strengthened using various security concepts, up to the use of cryptography (symetrical, asymetrical, one-time pad, etc.). A common pattern is the encoding of formated data in hexadecimal, that will be encrypted using RC4 and a known starting key, the resulting data being sometimes checked using a small CRC. Note that every decisions in securing botnet communication may render the bot more obvious and less prone to survive on a regularly monitored/used station. By compiling knowledge about botnet communication protocols and matching profiles with network analysis, not only a stronger automated detection of the nature of the sample can be achieved, but the communication may also be entirely decrypted. Most of the protocols has regular phases of key exchanges, with a root key being used to encrypt those exchanges : the root key can usually be extracted using reverse-engineering, static and behavioral analysis, or by checking it against default known values (the democratization of already crafted tools brought variety in the security level of malwares). Decoding botnet communication : results and script These thoughts can be applied as an extension to a malware analysis system like Cuckoo by providing scripts parsing through the packet capture and local information. By searching for known link patterns, data encoding and particular sockets, the nature of the botnet can be determined. Thus, the exchanges can be targeted and automatically decoded trying default keys (or keys extracted from previous parts of the analysis) and known protocols. The data exchanged, if decoded, can confirm the nature of the botnet but also define more properly the aim of the bot and give context to other network communications (e.g. confirming an other network behaviour to be from the malicious sample as a reaction to a control server order). Managing a batch of samples A point of interest that appeared while analyzing and studying wild botnet samples is the management of a group of samples. Although some precise, unique sample may be fed to the system, monitoring of wild sources of malwares or data extracted in the context of a case provide both with batches of samples. While wild sources will have multiple CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 30 malicious executables or indirectly infected files, a source from a case may have multiple executables infected with the same malware or various malwares deployed as the part of an orchestrated attack. Although using Cuckoo through the web application is somewhat user-friendly, it may not adapt well to the scale of tens or hundreds of samples coming from the same source. Manually feeding the samples is out of the equation, an automated solution can be crafted around the API. This will, at the same time, provide a helpful script for sample batch analysis (progressively pushing the samples through the network to the Cuckoo system, monitoring the analysis and downloading reports to the local station), while initiating the implementation of the integration into MALANET (by properly handling the API and transcribing the various possible requests into Python functions) and opening considerations on constructing functionalities around batches of samples instead of unique samples. For, indeed, if the local Cuckoo system is a stable, working tool, its potential has not been fully exploited yet. Batch managing components and processes 5.3 Virtual targets inside MALANET Although a functional stand-alone Cuckoo system is provided, it must also become a part of the overall MALANET analysis system. The same options should be offered by the various real targets and virtual targets existing, but the already-provided analysis made by Cuckoo should also get integrated into the analytical and reporting processes of MALANET. While the API and the utilities provide external interaction with the system, the analysis reports must be handled by MALANET processes and parsed into a secondary source of information. Indeed, the various options and parameters of a Cuckoo analysis run should be accessible from the local Cuckoo system and from the global MALANET system. The existence of the supplementary layer provides not only constraints but also the opportunity to craft a system around and explore the full potential of the virtualization-based malware analysis system. The actual system can indeed be extended by more thorough works on its various components and further development. The reinforced monitoring DLL providing on-the-fly data manipulation CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 31 is still more a proof-of-concept than an actual engineered tool. Further, switching between monitoring DLLs as well as switching between the various virtualization systems existing should be managed through scripts to provide a more user-friendly experience. The various scripts developed while experimenting with Cuckoo should be properly integrated into both local Cuckoo and global MALANET options, including the management of a batch of samples or the network analysis of botnet communication protocols. The various plug-ins should stay on a check-list, as an exemple the memory dump analysis system being on hold for compatibility and stability reasons (reasons that should evolve with future releases of Volatility and QEMU-KVM). Nevertheless, all these upgrades are mostly updates of supplementary parts brought to Cuckoo. The potential of the analysis system, coming from both the various elements and the combination of the various layers, can be enhanced by constructing intelligence around the memory capacity of the system. While the archiving system of samples and analyses provides a logistical answer to various issues, it can also serve as a source of information for future analyses. Using a profiling engine fed on older analysis and a data correlation engine for the analytical processes, a correlation system can be build to provide various new abilities : regarding a same sample, an automated comparison of the analyses in various systems (and, further, in various virtualization systems and on the long-run, even further, between real and virtual targets) can provide the expected host systems for the sample and its sensibility to and awareness for its environment ; regarding a batch of samples, correlation between samples can clean further analyses of duplicate samples embedded in different elements ; regarding any analysis, correlation may help an investigator to spot common elements between various samples by helping him to manipulate older reports as a source of information for its current analysis. Considerations about existing normalisation of malwares, to build intermediate profiles lighter than full-analysis reports, can be made, mainly around solutions like MAEC. While working on integrating Cuckoo, refining its complementary components and extending its potential through profiling, the simple botnet communication analysis can also grow. While it started mainly as promptly crafted scripts automatizing the parsing of the pcap file and the decoding of the exchanged data, it can benefit from a database of specific botnet communication patterns and known protocols. If this will provide the necessary components to try to decode possible botnet communications and to maintain the system, it may also find a more profound integration into MALANET by providing patterns and usecases to the fake Internet connection. This provides the unique opportunity to, if the sample communicates according to a recognized botnet protocol, answer back and check if the malware has a further conditional payload and possible commands. In a nutshell, to provide a fake C&C server using the detection and decoding system6 . Although using Cuckoo through the web interface or the API provides a somewhat user-friendly solution for basic functions, and full integration into MALANET should definitely assure that, the use of it should stay nuanced. The technical user-friendliness of the solution, plus the online and in-house documentation (in-house documentation with troubleshooting adapted to the local system deployment), may mislay an investigator. In the same manner as a botnet automated analysis may misunderstand communication for multiple reasons, the analysis produced by Cuckoo is limited by its possible weaknesses. Although a positive, profile-matching analysis may be right, a lack of proper answer is nothing of a positive absence of malware. Between options to bypass 6 And, thanks to low-complexity default-key samples in the wild, this can have results for regular basic cases. CHAPTER 5. ANALYSIS ENVIRONMENT: DEPLOYING VIRTUAL TARGETS 32 Cuckoo’s monitoring and environment-aware defensive routines, a malware may hide during the analysis or even produce a fake payload to mislead entirely. This uncertainty not only lies on top of metaphysical considerations about simulation-based analysis and data monitoring, but also on the cat-and-mouse game between detection of and survival chances of a malware. As many subjects in the security field, virus analysis, whether it is for anti-viral companies or forensics, is in a constant state of evolution. In order to end up on the at least understanding, and maybe winning (or regarded as so), side of this game, one has not only to accommodate to those evolutions (which asks for modular tools with continued development) but also to proactively look for weaknesses into its own analysis system in order to outsmart its opponent. Which raises the problematic of properly and thoroughly analyzing an analysis system while hunting for its limits. Chapter 6 System introspection: crafting a test sample factory "They say a police is only as good as his informants." - The Wire 6.1 Analysing an analysis system The building of a system depends on the capacity to debug and test it, while the continuous amelioration depends on the capacity to probe its limits. Though the analysis system is live tested on real targets, the feedback is limited to the few information caught in those processes. To achieve a better system, with the knowledge of its flaws and limits, a more manageable feedback is needed. Which can be achieved through the use of test samples. By being controlled samples that will imitate possible behaviours of real malicious samples, they will provide a reliable and manageable source of information. Because the most offensive or destructive behaviours would not be needed to actually probe MALANET, the focus is more put on environment-aware behaviours (e.g. virtual environment detection) and defensive routines (e.g. stealth measures). While also providing basic inputs and outputs for casual behaviours, in order to test the monitoring tools and forensics algorithms. By adopting such a strategy, and relying on controlled insiders, some kind of system introspection1 will be provided as a source of information for future work on the analysis system. While allowing for a potentially more exhaustive and contextualised analysis, the ability to precisely craft those samples will influence the quality of this introspective analysis. How to provide on the longrun a solution for crafting those samples while adapting to the discovery of the system weakness and potential evolution of the structure may find an answer in the crafting not of samples, but of a sample factory. 1 To be honest, if the test samples were part of the automated intelligence of the system, one could say the system is actually doing real introspection. It would first require a fuzzing process, the ability to note the presence of errors/missing pieces in the results, the crafting of an adapted sample and the creation of an introspection report. 33 CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 6.2 34 Crafting a sample factory Sample factory: conception Because the purpose is more about exploring the full potential of MALANET while hunting small breaches than about basic functional tests, there is a need of crafting specific samples on demand that can be met by providing a sample factory. The value of this factory would lie in its user-friendliness and an easier use of it than the manual craft of new simples. It should be easily manipulated and configured while providing access to most of the functions of a real sample. In order to properly design the factory, the challenging parts of crafting a sample should be identified and the difficulty of the tasks should be lowered through the mechanisms of the factory. One of the main point is the language in itself: though malwares can be written in various languages (usually compiled ones, like C, C++, .Net, VisualBasic, etc.), it already asks for some familiarity with one of the language. Further than this, to achieve interesting functions and complex behaviours (even more for complex malicious behaviour), a combination of a good insight knowledge of the targeted operating system and of a trickster/con-man mindset (the "smart attacker") is needed. Finally, without regarding the actual complexity of these functions, some overly used routine functions should be easily added to a sample to gain time through automation (asking only for the functional knowledge, at a higher abstraction level than the technical knowledge). To achieve these objectives, various components of the factory are clarifying. Indeed, the factory should have access to a library of already-coded functions, in order to reduce the user investment into the developing process. This library would contains pieces of code, formatted for the needs of the factory algorithm. To manage those pieces, while also reducing the complexity of development, a language easier to manipulate for a less experimented user should be provided. This language would achieve user-friendliness for casual users by placing itself nearer the high-level functional description of the sample than a technical implementation oriented language. The factory in itself would be able to process this pseudo-language to assemble and adapt the various pieces of code needed to craft the sample, ending if possible with the compilation of the code into an executable (otherwise, the resulting code should still be accessible for later compilation). Indeed, because the factory and its language would be a kind of novelty for any user, a basic but solid documentation should be provided. Sample factory: design of the pseudo-language The first part of the implementation is the design of the pseudo-language that will be used to describe the samples, providing some kind of configuration files for the factory. For this language needs to be understand by the main algorithm, it needs to be parsable2 , which means it will rely on some kind of structure and probably on flags or markers (this is a question of information theory and the ability to distinguish pieces of information in a stream). 2 Though both words parseable and parsable are accepted, it seems parsable is more legit. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 35 Different languages exists to format data, one of the well-known example being XML. On the other hand, the pseudo-language should have the look and feel of a programming language and may be used to provide functionalities (pre- and post-treatment of the descriptive information). Adapting something like XML so far away from its regular use may lead to a more complex and primary less user-friendly result than expected. A solution is to see this as an opportunity to create a homebrew language adapted to the needs. Though it does not assure the same quality and stability as a regular programming language, because it is only a step in the process of the factory (a configuration file used to switch from a functional description with some elements of programming to an actual low-level programming language), the option is plausible. The pseudo-language will help in manipulating the pieces of code by presenting them as a collection of functions, providing an ensemble of the selected instructions (functions) with their respective input and output arguments (while trying to minimize the exposure to inside mechanisms and temporary input and output variables). For a friendly visualization of the content, each instruction will be written on a new line with its arguments, according to the following pattern. instruction_name[input arguments][output arguments] The separation of each line using only the new line markers is quite weak (compared to a more obvious ";") but manageable. The instruction name, due to the huge quantity of pieces and the possibility to categorize them (according to the kind of functions: data manipulation, file manipulation, network communication, checks, etc.), will be split according to the category and the precise function: category.function (e.g.: reg.writekey to write a key in a registry). This reminds some programming patterns for structures or classes, providing a fast way to understand what kind of behaviour is asked of the sample. Once functions can be written down through instructions, rises the question of the arguments. Having access to input and outputs arguments, to provide hardcoded known data (for inputs, e.g.: paths, dumped files, keys) and redirect data between functions (for outputs, which may become inputs of a following functions), is an obvious need. They will be included after the function name between "[ " and "]", according to the previous pattern. Arguments may be of various natures. If some are hardcoded strings of paths or registry values, some may be optional values for API calls, raw code or injected instructions (a function may be the input argument of an other function). Distinguishable markers are required to separate the various types and treat them accordingly in the main script. • Type: optional value / raw data. Marker: nothing. Example: CREATE_NO_WINDOW. • Type: string. Marker: ", each side. Example: "VMware Tools". • Type: instruction. Markers: ( and ). Example: (cli.print["Owned"][])3 3 Indeed, injected instructions have their own arguments. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 36 Though this covers the basic ground needed for a pseudo-language, with only a few markers (line end between each instructions, and the five ones for data type), new functionalities in the pseudolanguage can be implemented with the addition of supplementary markers or keywords. One of them is the ability to repeat an instruction while changing the content of an argument. Due to the fair amount of values verification inside environment detection and defensive routines, some instructions may be repeated a few times at the same point in the overall sample behaviour. An interesting functionality would be the capacity to fuse those instructions into one with a list of the value of the arguments4 . Which requires new markers, for the list of values itself and to split the list into those values. Note that other local arguments would be repeated according to a modulo, allowing for various size of values lists as different arguments for the same instruction. • Use: outside borders of the list of values. Markers: { and } • Use: inside borders of the list of values. Marker: ,. • Example: {"sandbox","vm","malware","forensics","simulation",xoredsecretdata}5 Other interesting properties for the pseudo-language would be the ability to import already written instructions from an other source (an other configuration file). On one hand, as stated before, some routine instructions are quite common in malicious software and answers basic need (e.g.: a melting routine using a batch file to delete the executable file, and the the batch file itself). On the other hand, although the language tries to be at a functional level, some processes are still split in a handful of functions to allow the user to precisely craft the behaviour they want. But, for most of the users, the ability to manipulate the inside instructions is not required. The property of importing already crafted list of instructions turns out to be two different properties: importing already crafted list, without modification (routines), and importing a group of instructions with the first input and last output arguments (aliases). Routines are instructions without arguments that will be matched to other instruction files and imported on-the-fly. They will be indicated in the usual instruction pattern using routine as the category name and the file name as the function name (so the instruction becomes routine.filename). Aliases are instruction batches manipulated as regular instructions. A list of the aliases enabled and the matching batches of instructions will be kept in a file (the aliases file6 , and they will be called using their aliases name (which follows the usual instruction naming process). Depending on the priority of canonical instructions over aliases, aliases may also be a way to overwrite definitions. • Example of a routine: routine.melting[][] 7 • Example of an alias: netcom.HTTPGetFromHostname[srv,rsc,ua][buf,bufsize], replacing a succession of netcom.HTTPGetRequest, netcom.sockethostname, netcom.HTTPsend, netcom.HTTPrecv and netcom.closesocket. 4 Somewhat 5 Note similar to factorization in mathematics, somewhat. that a list does not imply any condition on the type of arguments inside, neither on a similar type for each value. 6 Although, as a R&D thought, importing from secondary aliases files could be a configuration option 7 Even if the routines use no outside arguments, they still follow the main instruction writing pattern, hence the empty []. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 37 Finally as an anecdotal but useful functionality, commentaries should be allowed inside schematic files (the list of instructions8 ). Commentaries allow indication and documentation inside the code, which pave the way for sane coding habits. They also provide a fast on-off switch for an instruction during a development phase. A new marker is needed to indicate that a line is a commentary and not an actual instruction, ## will be used. • Type: commentary. Markers: ##. Example: ## Ceci n’est pas un commentaire. If the pseudo-language used to describe the sample behaviour is the main concern, the pieces of code manipulated by the factory are also formatted. The precise markers will be adapted to the design of the factory algorithm, but a structure of markers starting with # will be used to parse the pieces. The main markers being the #DOC for the documentation header of each piece (which will be used for documentation generation), #INVAR and #OTVAR for the inside mechanisms replacing the local arguments inside the pieces by the arguments from the schematic, or the combination of the #HEADER, the #MAIN and the #FOOTER to split the actual code between what will be and what will not be repeated in case of a list of values as an argument. Those markers should not enter in conflict with the pseudo-language but may be an issue for the code inside the pieces (although a conflict has a really low natural probability to happen). One point inside the pieces that may provoke a conflict with the schematic, which is where both ensembles of information enter in contact, is the temporary names of the local arguments. In order to inject and extract arguments according to a schematic, those arguments need a temporary name inside the pieces. An arbitrary naming of IVAR and OVAR has been choose, with each temporary name being assigned a number corresponding to the position in the argument list of the instruction inside the schematic (e.g.: IVAR1 is the first input argument of the instruction). Both IVAR and OVAR, and their numeral variations, should be regarded as reserved keywords. In the future, things may evolved in complexity by the adding of default values for arguments (yet to be properly designed and implemented). The pseudo-language, although it may evolve in future development, can be considered sufficient now. The resulting list of reserved markers, whether chars or keywords, is the following. • Commentary in the schematic: ## le commentary • Argument lists: [le arguments] • String as an argument: "le string" • Instruction as an argument: (le instruction) • List of values as an argument: {le first value, le second value} • Routine as a category name: routine.lefunction • IVAR and OVAR as internal mechanism: le IVAR1, le IVAR2, le OVAR1 • Backslashing reserved chars: "\[le string \]" 8A schematic, a list of instruction to manipulate and assemble pieces, makes sense for a factory. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 38 Those markers were choose to provide an efficient description language with basic programming logic and functionalities, while keeping it small and simple. In case of a special char being used in a variable name or a string, the usual backslashing solution is enabled (e.g.: a string containing a path would be "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Image File Execution Options"), avoiding to many limitations in the writing of new schematics. A trick to limit the quantity of reserved keywords is for example the use of an outside file for aliases while including them naturally in the usual pattern (to transfer the information intelligence from the data itself to the algorithm managing the data). Sample factory: technical implementation Once the pseudo-language has been established, the development of the factory is more about the technical implementation and the engineering questions of translating the concepts into an efficient tool. If the pieces of code will be in C, in order to provide a compiled language for the sample (to in the end produce an executable with a minimum of dependencies), the main scripts will be in Python (for easiness of development and personal reasons). The factory is built around a main python script, the Architect. The Architect is the core component of the factory, able to read the schematics (list of instructions in the pseudo-language acting as a compromise between a configuration file and a code file), pick and manipulate the pieces according to the schematics, crafting a .c code file and asking a compiler to produce the executable. The Architect has its own configuration file, which contains names of folder (for the various components) , the command and the options for the compiler, pre- and post-treatment of the information (whether it is the schematic, the code itself, etc.), and so forth. The Architect components and workflow CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 39 The core of the Architect is a parser, built according to the structure of the pseudo-language, following a workflow of successive inclusions to avoid conflict. Strings will be detected to look for reserved chars, that will be temporary rewrite as an explicit hexadecimal value of themselves (in extended ASCII), to avoid them being caught by the next steps of the parser. And so forth, from strings to list then to arguments and instructions. Some step of the parsing are self-invoking to manage the possibility of embedded arguments inside an instruction as an argument, and so ad lib. Other components of the factory includes the pieces folder, the schematic folder, the result folder, the configuration file for the Architect itself and some utilities. The pieces folder is the library of available pieces, formatted according to the pre-define structure (#DOC, #INVAR, etc.). The files are named using the category name and function name system, although according to this pattern category_function (instead of a dot as a separator). The extension is .piece. The aliases file can also be found under the pieces folder. The schemas folder contains the various schematics, written in the pseudo-language, with a .schematic extension. Finally the results folder is the home of the various C code produce, and the corresponding executables if compilation succeeded. Finally the Architect also has a few utilities at its disposition (and its users disposition). To answer the need of a documentation for the new pseudo-language and the existing pieces, a documentation generator utility (docgen.py) will build a file text with the various #DOC header of the pieces and a few coding guidelines for the pseudo-language. To easier the crafting of new samples and the user of the xOR instruction (datamanip.xORarray), an script producing the hexadecimal value of an xOR between a given key and a given value exists (xorarray.py)9 . Those three utilities are examples. Others may be developed to keep providing the user of the Architect with an easy sample crafting tool full of options. Speaking of which, further than utilities, the Architect will also provide with pre- and post-treatments of information. The use of a factory in Python and the manipulation of the pseudo-language allow for various automated processes that would provide better samples without asking more efforts from the user. For examples, junk code can be automatically added to the sample (whether directly in the C code or through junk instructions), or variable names in the schematics may be automatically randomized. All of which in order to provide a, at least, more efficient, faster, more subtle or more compact sample. The automated treatment would be options indicated in the schematics, checked by the Architect before parsing the instruction list. In the end, after the definition of the language and the construction of the Architect system, the pieces folder has to be populated to provide at least basic functions for test samples. Basic behaviours are related to data manipulation and simple inputs/outputs functions for common types of data, functions that can served as basic elements for more evolved behaviours. If the test samples are not delivered with a malicious payload, placing some functions outside the range of pieces development, they still should provide other malware components like a self-replication engine, a network communication engine and various stealth and defensive routines (including environmentaware functions, crucial to test a simulation-based analysis). 9 It allows the use of xored hardcoded value in the sample, limiting the information available from a simple static analysis. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 40 Although development of pieces is a work-in-progress, basic categories has already been defined and populated. To manipulate various type of data (read, write, expand, delete, etc.), elementary categories have been created : file, reg for registry keys (including the difference between a key and a value), mutex, process (various injection techniques yet to be properly implemented), sysvar (environment variables, Windows version, local executable full path, etc.). Particular data manipulation is covered by datamanip (xOR, concatenate strings, split HTTP header and content, etc.), network connection and socket functions by netcom (socket using IP address or hostname, opening, sending, receiving, closing, GET and POST request formating), basic interface manipulation by cli (show, hide, print). For more conditional behaviours, in order to build reactions around environment-awareness capacities, the category if is being developed. Though crude for the moment, with an instruction per kind of if (value of registry key, username, presence of debugger, mouse movement, existence of a file, etc.), getting reactions through input arguments and with plain repetitions in case of multiple checks, it still already provide opportunities for multipath behaviours. To still provide every possibilities of behaviour while furnishing properly crafted instruction, a categoryless instruction direct allows to push raw C code through the pseudo-language. In a similar fashion, a categoryless loop allows to push raw code or regular arguments (including other instructions) inside a for loop. Regarding work-in-progress categories, three main categories will complete missing crucial elements for malicious software. The random category will provide basic random functions (integer, boolean, uid, etc.) for fuzzing, stealth and cryptographic functions (and also the interesting strategical move of launching the real payload only once in X executions). The gui category will provide further interface instructions than the simple cli, building on Windows visual interface functions or directly on OpenGL. The junk category will provide neutral instructions that will not modify the core behaviour but will provide useless code to disturb reverse-engineering and behavioural analysis (automatically injecting junk code may be a future information treatment offered in the Architect configuration). RC4 will serve as a foundation for the crypto category, a category that will work in coordination with the datamanip category (xOR manipulation may switch category for the crypto one, CRC instructions will join one of them). Some aliases and routines has already been written, starting the production of more complex yet casual elements of malwares. Two aliases has been made to cover two regular network operation, get data and post data to an url (netcom.HTTPGetFromHostname and netcom.HTTPPostToHostname respectively). They avoid socket consideration for the schematic writer, covering the whole connecting and crafting of a HTTP request parts. Depending of the evolution of the if category and check, new aliases may cover the new structure of conditional verifications and reactions. Future aliases may also cover encoding/cyphering and decoding/decyphering operations. Regarding routines, one common routine is already implemented : melting. Melting is the self-deletion of the malware executable (and dropped files in case of advance melting), to clean the malicious tracks. A regular trick, for .exe are not allowed to delete their own file, is to write down a .bat file and use it to delete both, for .bat are allowed to. The routine covers the writing of the file and kickstarting of it in a windowless console. In development routines include a simple self-copy one and an other, that may instead turn into a combination of aliases, the unpacking of an xORed CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 41 embedded file (and maybe kickstarting of it, reproducing the full unpacking routine of malicious payload). Various schematics are being produce, evolving (in quality and quantity) along the factory prototype and the available pieces. A few schematics try to imitate an existing code or extract of code, to illustrate the capacities of the factory while providing emblematic samples. It includes an imitation of pafish, the paranoid fish, an environment-awareness test executable aimed at detecting sandboxes, and a copy of a botnet first communication from a recent version of Andromeda. Following the Matrix-themed naming of this project (hence the Architect), and along the guidelines of the two-step development idea (see next subsection), Anderson the unaware sample and Neo the aware sample are being produce (the first providing basic I/O tests while the second is more focused on playing the system). TheKid is an intermediate sample, used not to test the analysis system but to explore the possibilities of the sample factory (finding himself between the unaware state of Anderson and the fully-aware state of Neo, hence the Kid). Smith is a work-in-progress sample focused on propagation with an aggressive parasitic approach (hence Smith) to provide test samples for infection analysis system (an other paradigm in malware research than AV or forensics). Sample factory: the two-step development The introspection process into MALANET can be kickstarted once a few samples have been crafted using the factory. Nevertheless, taking into account the needs and actual deadline leads to start developing simple samples before achieving the first stable fully-functional prototype of the factory. Hence, two steps has been determined using the capacities of a sample as the current objective. The first step will be to provide a non-malicious sample that has no notion of systemawareness. With basic capacities for data manipulation and regular behaviours, it will serve for debugging and fine tunning purposes. Being unaware of the system around him, he is Anderson. Once basic introspection has been brought to the system through this possibility, the focus can shift to upgrading the factory and the pieces library to answer the next step. The second archetypal sample is aware of the system around him, and, building on basic functionalities, will try to hide from the system and exploit its weaknesses. Being aware of the system around him, he is Neo. These two steps will help building the factory, providing a somewhat fluid development, with increasing complexity and switching from simple instructions to more complex routines over time. The two different goals should also bring all the questions the pseudo-language should answer (creation of routine, conditional behaviour, global variables shared between instructions, etc.). Beyond R&D support, it will also provide two solid starting points for users to craft new samples according to their needs (whether it is functional tests or weakness hunt). Further development starts with a new archetypal sample, Smith, focused on local and network propagation and file infection. Though it may be useful for other simulations, less probable regarding analysis system introspection, it will also provide new challenge to polish the Architect. Further than this phase of development, once the prototype of the Architect is efficient enough to build interesting versions of Anderson and Neo, focus can be put on the Architect itself. The tool can be extended to provide visualization of schematics linked to the knowledge of the various steps and transmission of variables, allowing the user to easily see the workflow of the sample. A GUI can CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 42 be constructed to provide user-friendly configuration, including the choice of the current schematic. It may be extended to also provide a sort of IDE for schematic, aliases and pieces writing. The construction of samples can also be updated to provide an automated report dropped by the sample once a run is done (no more instruction, crash, timeout, etc.) that could be analyze automatically to point out weakness10 . An interesting approach to continue the R&D of the factory system, without burdening the local agency, and while distributing the benefits to more people (from other agencies, AV, security researchers, enthusiasts, etc.), is to turn the Architect into an open-source project. 6.3 Sharing the good stuff Though this may be bold, an unusual request should be considered about the architect. The factory system could be open-sourced and published on Github. A few point should be taken into consideration beforehand. Pragmatically speaking, open-sourcing it is as much about the promise of future development as it is about the consequences of releasing a tool in the wild. On regards of the development question, open-source is obviously a strategical choice: it will provide a small probability of further developing (while at the same time letting an easy access to the updated versions) and a bigger probability of forking and development of new tools/integration into other systems. Regarding to the release of a tool in the wild, the Architect in itself is a factory and not a malicious program. In the same way everything can have multiple uses, and one does not incriminate the tool but the user, the Architect is not an incentive to produce malicious code nor a complete and finished virus generator. As an example, it has also the potential to help other actors needing non-malicious test samples or to provide a proof-of-concept to programming enthusiast. Indeed as an open-source factory it will provide a tool to script-kiddies (yet without providing them with a fully operational and malicious program), but its real power lies in expanding it or crafting on top of it. Someone that would have the skill and motivation to provide this work would have an above average level compared to the script-kiddies and would probably not need the factory to craft its own malicious program. So instead of focusing on the script-kiddie issue, one could focus on what it brings to other researchers (including other agencies, private companies, enthusiasts). It provides more than a factory, it provides a humble proof-of-concept of a higher language for development and an argument for user-friendly programming while expanding the potential of autonomous and system-aware agents (which, from cellular automaton to artificial sapiens, is always an interesting domain to dig in as far as science goes). Regarding security and sensitive issues, first, it does not provide confidential insights on the MALANET system or other in-house works. Though it is used for testing purpose, not providing precisely designed and targeted schematics will avoid providing sensitive information about the local system (schematics can still be furnished, as some anti-virtualization routines can be easily found on the Internet and mimicking of already existing malwares is an option). Which also leads to the second point: while the core of the factory is released, a precise choice can be made about 10 With a supplementary step of the factory crafting a new schematic and matching sample according to spotted weaknesses, the introspection could not only be automated but self-focusing and self-improving. CHAPTER 6. SYSTEM INTROSPECTION: CRAFTING A TEST SAMPLE FACTORY 43 what component should be released or not (hence, some pieces may be let out of the open-sourcing process). This decreased the danger carried by the tool while still providing an interesting proofof-concept and a departure point for any enthusiast. An interested party could raise the question about intellectual property and money. For the crafting of this tool was indirectly paid through its developer. It should be noted that the intelligence lies not only in the tool but in the schematics and the work on the introspection process of MALANET. Further than this, there is the question of the pros and cons of the potential further development versus the intellectual property ; and there is the question of the enthusiasm for the scientific field. Indeed, though it can be useful to others and it is up to the agency to decide if its governmental fund should serve only itself or also provide others, this is also a way to finance decentralized scientific research (in its smallest and humblest form) and to pay respect to the technical field (as viruses are more than malicious programs). Finally, as a more personal argument, the factory is a point-of-interest that will surely be worked on again. Further than the artisan challenge of crafting and caring for such a system, its links to information questions (how to properly craft a language, how to manipulate pieces to create an ensemble) and to autonomous agents (here the virus is more seen, studied and used as a systemaware agent than as a malicious agent) makes it a worthy experiment. Though it stays a humble proof-of-concept, it could still provide the foundation or the basic idea for a more complex system. Chapter 7 Law and technological (dis)Order The caterpillar said, "One side will make you grow bigger and the other side will make you grow smaller" "One side of what? The other side of what?" thought Alice to herself. "Of the mushroom," said the caterpillar. Alice looked at the mushroom, trying to make out which were the two sides of it, as it was perfectly round. Alice’s Adventures in Wonderland 7.1 Computer crime and computer police, a brave new world Technology is one of the strength of the human people, allowing us to go further than the limits of our natural evolution by crafting theories to understand our environment and tools to manipulate it. If sometimes the cost of it may render it unwanted, a bigger problematic lies in versatile technologies with potential nefarious impacts (like the dual-use technologies). Although IT was never out of consideration, due to links between communications and ideas transmission, it became central in the era of modern IT and computer-supported crimes. In a similar manner to applied physics bringing stone bullets and atomic bombs, IT brought data stealing spyware and malicious parasitic viruses. Having to continuously serve and protect society, law and order agencies had to cope with the fast technological evolution, and the world of possibilities opened by long-distance interconnected networks and automated data-processing machines. If the first cases of computer crimes, when the laws themselves were still virgin of those problematics, were linked to enthusiasts pushing the limits further, no one can deny any more the presence of the criminal elements. For early adopters of practical innovations can be easily found in criminals, they were amongst the first to be concerned by the technological democratisation. Nevertheless, if modern IT changed the tooling and behaviours of criminals, offering new malicious possibilities, it also provided the law enforcement agencies with new tools to approach those new behaviours. The same piece of code, a part of both side adaptation to a new environment. With the potential to go further than matching the components of the other side, building on top of IT 44 CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER 45 technologies allows for a more insightful and intelligent comprehension and investigation of crime for technology-related elements. 7.2 Ex machina: the assistant in the machine Since the computer, the broad comparison between modern IT and the mechanical revolution of the late industrial phase has gained weight : where the mechanical automates once started to replace humans, digital automates can now be found. Providing an unexpected extension of the tools available to society, automated digital systems proved themselves useful for data monitoring and manipulation. Evolving from managing static data with calculations to observing live data with forensics, automated analysis started to integrate simulation-based solutions : by providing a controlled environment, the behaviour of executables can be more thoroughly examined. When delinquents and criminals started to use live data for their purposes, leading to the concept of malwares, the help provided by automated systems and the quality provided by simulation-based analysis revealed themselves crucial elements for law enforcement agencies. With the recent rise in low-complexity cases, due to underground markets and democratisation of dual-use tools, the choice of IT-assisted police is one of the answer with minimal cost and dedicated workforce. It is in this tendency that the KI-42 unit of the BKA came with the MALANET experiment. This evolution through script-based automation of regular forensics and investigation tasks provided additional potential through synergy between the various components and correlation of various information sources. By seeing the whole system when manipulating even the humblest component, it is possible to tap in this potential to bring intelligence in design to the system. Nevertheless, if the integration in the frame of modern law and order agencies seems perfect, these systems need to be handle with care when delivering results. Indeed, every automated interpretation may miss an obscure point, asking to stay critical in front of forensics tools, but a simulation-based analysis is bound to imperfection. Simulations will never achieve perfection due to their limited extent, but also due to a combination of undocumented behaviours and errors, leaving breaches and glitches to be found. For their advantages are still interesting, the automated system needs to be analyzed and tested by fire to reinforce it. In a mindset similar to pentesting or fuzzing, introspective analysis based on controlled test samples may provide the necessary information. 7.3 Viruses, beyond good and evil For local automated analysis systems are oriented towards malwares and must confront potentially environment-aware software, endangering the simulation-based analysis through its flaws, an inside analysis can be constructed on top of controlled test malwares. By developing non-destructive malwares that can be pinpointed towards specific points of interest, an adapted analysis can reveal the practical limitations and breaches in a system. Though the same code was once called malicious, this case is a concrete example of the relativity of this definition. By employing them for a different goal, even with similar technical behaviours, malwares focused on stealth routines and environment-awareness become a tool for law enforcement continuous update of IT forensics solutions. Nevertheless, this specific need of malwares is linked to CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER 46 their use for criminal activities in the first place. But, by embracing the possibilities of legit uses, further applications for scientific research and engineering reveal themselves. Whether it is for the propagation abilities with or without parasitic capacities, the evolutionary quest of change and adaptation to optimize survival chances, or the out-of-box mindset providing ways to manipulate a system in an unexpected direction, so-called malwares provide topics worth thinking. Various sciences of the large IT field can be connected to them, from cellular automata, regarding reactions to environmental variables for reproduction purposes and probability of future patterns, to artificial sapiens, regarding awareness and evolution processes. Engineered applications may rise from self-replication researches, regarding data sharing and synchronisation, or from metamorphic and polymorphic algorithms, regarding practical adaptive code. 7.4 Technological evolution and societal evolution Although this paper started as a purely scientific and technical one, both the subject (malicious software) and the work environment (law enforcement agency) led to considerations about the links between a behavioural definition and the casual approach to the subject, about, as a consequence, the wasted potential regarding research and development. If automated systems, simulation-based analyses and the introspective angle provided this computer science paper with a humanity coloration, trying to detach from the practical view of IT, the path led to a consideration about the integration of a technology into society. And what a better background than the criminal and law enforcement one to speak about society and technology misuses. Though the label malicious software has been widely adopted, debating it led to pondering about putting the blame on technology instead of human actors, creating intellectual inertia. If an accepted technology began to be integrated by society, it will indeed be accessible to enthusiasts, researchers, law enforcement and criminal elements, for all of them are parts of society. Nevertheless, the initial technology can not be considered as intrinsically good or bad. The actual problematic may seem secondary at best, ill-conceived maybe, but with the advances in global IT in modern times and the further integration into society casual behaviours and human actions and interactions, it may provide insightful answers. And the questions about technology are, indeed, nothing new. But some issues should be confronted, issues that may provide more answers than the question of control and censorship : about the level of control we deploy and the impact on innovation of criminalisation and taboo, about the apparent easiness of addressing technology issues through control instead of addressing human issues through communication, about digital organisms with once controlled evolution but free in future behaviour, about the fear of loosing control of inside processes in the digital world. All of the above find itself at an intersection between sciences and humanities, engineering and ethics. But the final issue may be the question of developing a mature stance in front of technology inside society, not only in regards of innovation, but also in regards of fearmongering and exclusivity of knowledge (both critical debates nowadays, with the cybersecurity trend, the terrorism topic and the intellectual property question). CHAPTER 7. LAW AND TECHNOLOGICAL (DIS)ORDER 47 On a lighter and more in context note, changing the usual considerations and extending approaches to a less orthodox thinking may lead to the proper integration of the enthusiasts community into the mainstream scientific R&D (nota bene : integration differs from assimilation). It would provide a leap forward in knowledge and innovation, by putting in contact various researches made with different mindsets and under different conditions, while also opening the sometimes rigid universitary scientific system to new approaches of innovative experimental work and to new definitions of boundaries and criticisms. Technology is intrinsically neither good nor bad, and can always be used for destructive purposes. For our actual stance leads to controlling scientific researches and to banning engineering topics, we are damaging our own innovation and problem-solving capacities. Hence, the question is not should we fear technology, living in the shadows of past nuclear wars and future artificial gods, for we would just abandon it1 ; but how to build a working society that could live with that without constant pressure and control. And the answer definitely lies more in society itself than in code. 1 And it would be a bad strategical move for a specie who switched from biological evolution to technological evolution. Appendix A Extracts from the Cuckoo guide Use : webpanel details Using the webpanel, the sample can be pushed via the submit menu, the reports can be accessed via the "recent" menu (or, for older reports, with /analysis/id, id being the numerical id of the analysis). The reports are formatted according to the various analysis : static, behavioural, network (plus all the small things like the dropped files, the archive inside mongod, etc.). The first page of the report is somewhat a summary, with the hosts contacted, the registry keys and the files accessed/manipulated, the mutex created. In a report, the "Quick Overview" provides Cuckoo’s log of the analysis (that can be unwrapped using the "Show log" button), a summary of the hosts and domains contacted and a summary of the files, registry keys and mutexes asked for. Note that those variables may have been looked for, created and/or used (so a file listed under "Files" may have been looked for, using a OpenFile function which failed, because the file was not here in the first place). [...] Maintenance : updating Updating some or all the parts of the system will probably break the system. First because updating the virtualisation system will probably break the current snapshot and ask for a new one (please refer to the troubleshooting section or the cheatsheet appendix for more information). Second, because the Cuckoo system is a delicate harmony between a virtualization system, Python scripts, databases, web services (Django, webserver stack, API), and a proper virtual network configuration. Indeed, the same goes for possible plug-ins, extension, integration into a bigger system, etc. Overall compatibility, with this high number and level of interactions, is a critical piece of the puzzle. Updating is still a good idea, but it should be taking into account the time to reconfigure/reconstruct the system. [...] 48 APPENDIX A. EXTRACTS FROM THE CUCKOO GUIDE 49 Troubleshooting : can not load the machine and/or the snapshot In case Cuckoo can not load the virtual machine or the snapshot, the source of error may be a problem inside the virtualisation system or a problem with the current snapshot. If a virtualisation system is updated, modified, reinstalled, there is a huge chance that though the machine is still working the snapshot isn’t any more. To check the machine state, one can access the system using [...] through the network (require ssh-askpass on the local station, connecting to the distant IP of the cuckoo server, using the regular user account). If the machine is booting and running without issues, it is probably the snapshot. [...] Troubleshooting : empty analysis, no error in the log An other source of error is the lack of proper network connection between the guest and the host. There is two possible way to detect this error : Cuckoo will not start the analysis because it can’t plug into the network interface (for tcpdump purpose and all) or Cuckoo will run the analysis but no information will be captured by Cuckoo and the report will be blank. The second case, most common one, happens because the machine is available (so it will be launched) but the IP/interface is wrong and Cuckoo will not be able to communicate with agent.py. This error can be confirmed looking at the pcap of the analysis or by watching the local (on the server) packet capture during the analysis. If something similar happens, one should access the machine (using [...]) and check the local IP. If no IP is attributed, an ipconfig /renew may do the trick (in case the DHCP attribution was broken at some point), and a new snapshot may have to be taken. If an IP is present (and one may have to check the IP inside the snapshot and not the regular running state in case the DHCP lease changed, but the probability is small), one should compare it to the IP written down in the configuration file for the machinery (e.g. [...]). There is also the possibility of having an IP problem on the host side : the local IP for the virtual network ([...]) is given in the configuration files cuckoo.conf and [...]. One should check they match the reality of the local IP and network interface configuration in case the communication is broken. Last but not least, it may be the virtual network or its interfaces that aren’t working anymore : Cuckoo needs a working network between the host and the guest. [...] Appendix B Cuckoo API and Python : examples Pushing a sample answer = r e q u e s t s . p o s t ( s t r ( cuckoo_api)+ " / t a s k s / c r e a t e / f i l e " , data={ ’ machine ’ : m} , f i l e s ={ ’ f i l e ’ : open ( o s . path . j o i n ( s t r ( s a m p l e s _ f o l d e r ) , s t r ( sample ) ) , ’ rb ’ ) } ) Checking an analysis status answer = r e q u e s t s . g e t ( s t r ( cuckoo_api)+ " / t a s k s / view / "+s t r ( sample_id ) ) i f answer . s t a t u s _ c o d e == r e q u e s t s . c o d e s . ok : t d a t a = s t r ( answer . j s o n [ u ’ t a s k ’ ] [ u ’ s t a t u s ’ ] ) Downloading a report answer = r e q u e s t s . g e t ( s t r ( cuckoo_api)+ " / t a s k s / r e p o r t / "+s t r ( sample_id ) ) i f answer . s t a t u s _ c o d e == r e q u e s t s . c o d e s . ok : with open ( s t r ( r e p o r t s _ f o l d e r )+ " / "+s t r ( sample_id)+ " . r e p o r t " , "w" ) a s t f i l e : j s o n . dump( answer . j s o n , t f i l e , i n d e n t =4) 50 Appendix C Decoding botnet message : examples Decoding RC4 encrypted message def keystream_gen ( key , k e y s t r e a m _ s i z e ) : KS = [ ] S = range ( 2 5 6 ) j = 0 f o r i in range ( 2 5 6 ) : j = ( j + S [ i ] + ord ( key [ i%len ( key ) ] ) ) % 256 S[ i ] , S[ j ] = S[ j ] , S[ i ] i = 0 j = 0 f o r x in range ( k e y s t r e a m _ s i z e ) : i = ( i +1)%256 j = ( j+S [ i ])%256 S[ i ] , S[ j ] = S[ j ] , S[ i ] KS += [ S [ ( S [ i ]+S [ j ] ) % 2 5 6 ] ] return KS def decode_message ( key , data ) : odata = [ ] f o r i in data : odata += [ ord ( i ) ] keystream = keystream_gen ( key , len ( odata ) ) decdata = [ ] f o r i in range ( len ( odata ) ) : d e c d a t a += [ chr ( odata [ i ] ^ keystream [ i ] ) ] return ’ ’ . j o i n ( d e c d a t a ) 51 APPENDIX C. DECODING BOTNET MESSAGE : EXAMPLES Decoding ROT+xOR message def decode ( i n s t r i n g , key , r o t ) : t2 = [ ] , t5 = [ ] f o r c in i n s t r i n g : t 1 = ord ( c ) − r o t t 2 += [ hex ( t 1 ) [ 2 : ] ] t3 = b i n a s c i i . u n h e x l i f y ( " " . j o i n ( t2 ) ) f o r i , c in enumerate ( t 3 ) : t 4 = key [ i % len ( key ) ] t 5 += [ chr ( ord ( c ) ^ ord ( t 4 ) ) ] return " " . j o i n ( t 5 ) Parsing Andromeda server orders def t r a n s l a t e _ s r v m e s s a g e ( hexdata ) : r e c v i d = hexdata [ : 8 ] , t r e c v i d = " " , pos = 8 , moar_cmd = True f o r i in range ( len ( r e c v i d ) / 2 ) : t r e c v i d = r e c v i d [ i ∗ 2 : ( i +1)∗2]+ t r e c v i d print " Tag_RecvID : "+s t r ( t r e c v i d ) print "Cmd : "+s t r ( hexdata [ pos : pos +2]) while moar_cmd : i f len ( hexdata)>pos : t i d = hexdata [ pos +2: pos +10] , t t i d = " " f o r i in range ( len ( t i d ) / 2 ) : t t i d = t i d [ i ∗ 2 : ( i +1)∗2]+ t t i d print " t i d : "+s t r ( t t i d ) pos = pos+10 i f len ( hexdata )>pos : t = "" while hexdata [ pos : pos +2] != " 00 " : t += hexdata [ pos : pos + 2 ] . decode ( " hex " ) pos += 2 print t pos += 2 else : moar_cmd = F a l s e else : moar_cmd = F a l s e 52 Appendix D The Architect : examples mutex_create.piece #DOC C r e a t e a mutex ( no e r r o r management ) In v a r s : mutex name (IVAR1) Out v a r s : mutex h a n d l e r (OVAR1) #INVAR IVAR1 #OTVAR OVAR1 #INCLUDES <windows . h> #VARS HANDLE OVAR1; LPCSTR MName; #HEADER #MAIN MName = IVAR1 ; OVAR1 = CreateMutex (NULL, TRUE, MName ) ; #FOOTER 53 APPENDIX D. THE ARCHITECT : EXAMPLES 54 if_regstr.piece #DOC Checking r e g i s t r y key v a l u e data In v a r s : r e g i s t r y key f u l l path (IVAR1 ) , v a l u e t o query name (IVAR2 ) , v a l u e t o compare (IVAR3 ) , match r e a c t i o n (IVAR4 ) , o t h e r r e a c t i o n (IVAR5) #INVAR IVAR1 , IVAR2 , IVAR3 , IVAR4 , IVAR5 #OTVAR #INCLUDES <windows . h> #VARS HKEY rK , DWORD dwType , DWORD dwDataSize , char ∗ r g d a t a #HEADER r g d a t a=NULL; dwType=0; dwDataSize =0; RegOpenKeyEx (HKEY_LOCAL_MACHINE, TEXT(IVAR1 ) , 0 , KEY_QUERY_VALUE, &rK ) ; RegQueryValueEx ( rK , TEXT(IVAR2 ) , NULL, &dwType , NULL, &dwDataSize ) ; r g d a t a = ( char ∗ ) m a l l o c ( dwDataSize +1); RegQueryValueEx ( rK , TEXT(IVAR2 ) , NULL, &dwType , rgdata , &dwDataSize ) ; r g d a t a [ dwDataSize / s i z e o f (TCHAR) ] = TEXT( ’ \0 ’ ) ; #MAIN i f ( strcmp ( rgdata , IVAR3) == 0 ) { IVAR4 ; } else { IVAR5 ; } #FOOTER f r e e ( rgdata ) ; RegCloseKey ( rK ) ; APPENDIX D. THE ARCHITECT : EXAMPLES 55 melting.schematic #SAMPLE name [ m e l t i n g ] v e r s i o n [ beta ] #PIECES s y s v a r . exepath [ ] [ l p a t h ] datamanip . c a t s t r [ " d e l /F /Q " , l p a t h ] [ b s t r ] datamanip . c a t s t r [ b s t r , " \ n d e l t e s t . bat " ] [ batched ] f i l e . w r i t e t x t [ " t e s t . bat " , batched ] [ ] p r o c e s s . c r e a t e [ " t e s t . bat " ,NULL,CREATE_NO_WINDOW] [ phandle ] theKid.schematic #SAMPLE name [ TheKid ] v e r s i o n [ alpha ] #PIECES c l i . p r i n t [ " \ "THE KID\ " \ n " ] [ ] f i l e . r e a d [ "C: \ \ f o o b a r " ] [ f i l e c o n t e n t , b f s i z e ] c l i . print [ filecontent ] [ ] c l i . p r i n t [ { " \nTHE \ [ KID \ ] " , " \nAM \{ I ALONE\} ? " , " \nNO, YOU ARE NOT \ "ALONE\ " " } ] [ ] i f . username [ { "SANDBOX" , "VM" , "MALWARE" } , e x i t ( 0 ) , s l e e p ( 1 ) ] [ ] netcom . HTTPGetFromHostname [ "www. i n t o t h e m a t r i x . com " , " / rl_img / anime_gallery_kids_1L . j p g " , " t h e O r a c l e " ] [ t h e B u f f e r , b u f f s i z e ] datamanip . splitHTTPheader [ t h e B u f f e r , b u f f s i z e ] [ httphead , httpdata , headsize , datasize ] s y s v a r . g e t e n v [ "USERPROFILE" ] [ u s r v a r ] datamanip . c a t s t r [ u s r v a r , " \\ damnkid . j p g " ] [ k i d f i l e p a t h ] f i l e . w r i t e p t r [ k i d f i l e p a t h , httpdata , d a t a s i z e ] [ ] r e g . w r i t e v a l u e [HKEY_CURRENT_USER, " C o n t r o l Panel \\ Desktop " , " Wallpaper " , k i d f i l e p a t h ] [ ] routine . melting [ ] [ ] Bibliography [1] Michael Sikorski and Andrew Honig, Practical malware analysis, The Hands-On Guide to Dissecting Malicious Software. no starch press, 2012, ISBN: 9781593272906. [2] Niels Ferguson, Bruce Schneier and Tadayoshi Kohno, Cryptography Engineering, Design Principles and Practical Applications. John Wiley & Sons, 2010, ISBN: 9780470474242. [3] Steven Levy, Hackers, Heroes of the Computer Revolution. Penguin Books, 2001, ISBN: 9780141000510. [4] John Aycock, Computer Viruses and Malware. Springer, 2006, ISBN: 9780387302362. [5] VX Heaven, vxheaven.org, ”Library” section. [6] Alien Vault, www.alienvault.com, ”Hardening Cuckoo Sandbox against VM aware malware” post. [7] prowling - NSM foo, blog.prowling.nu, ”Hardening Cuckoo” topic. [8] Fred Cohen, Computer Viruses, Theory and Experiments, http://all.net/books/virus/, 1984. 56