Behavior abstraction in Malware analysis
Transcription
Behavior abstraction in Malware analysis
Behavior abstraction in Malware analysis Jean-Yves Marion LORIA INRIA mercredi 8 juin 2011 Joint works with Philippe Beaucamps Isabelle Gnaedig Daniel Reynaud (Berkeley U.) mercredi 8 juin 2011 What is a malware ? mercredi 8 juin 2011 What is a malware ? • A malware is a program which has malicious intentions mercredi 8 juin 2011 What is a malware ? • A malware is a program which has malicious intentions • A malware is a virus, a worm, a botnet ... mercredi 8 juin 2011 What is a malware ? • A malware is a program which has malicious intentions • A malware is a virus, a worm, a botnet ... • Giving a mathematical definition is difficult mercredi 8 juin 2011 What is a malware ? • A malware is a program which has malicious intentions • A malware is a virus, a worm, a botnet ... • Giving a mathematical definition is difficult ✴ So how to detect a malware ? mercredi 8 juin 2011 What is a malware ? • A malware is a program which has malicious intentions • A malware is a virus, a worm, a botnet ... • Giving a mathematical definition is difficult ✴ So how to detect a malware ? ✴ How to protect a system from a malware ? mercredi 8 juin 2011 What is a malware urquoi tracer ? (1/3)? • A malware is a program which has malicious intentions nition : l’analyse binaire, c’est e l’analyse deisprogramme • A malware a virus, a worm, a botnet ... ù le programme est inconnu • Giving a mathematical definition is difficult ⇒ on a juste un blob binaire sons✴: So how to detect a malware ? auts indirects ✴ How to protect a system from a malware ? =⇒ flot de contrôle indécidable ectures/écritures indirectes =⇒ flot de données indécidable ode auto-modifiant =⇒ syntaxe indécidable mercredi 8 juin 2011 Code protection Detection is hard because malware are protected 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mercredi 8 juin 2011 !"#$%&'(#)($*+, Protections: Self-Modification - .&( &/ 01.213# /104.4#5 65# "&0#7018# • A lot of malware families use home-made obfuscations, like packers to 91:;#35 93&(#:( <4'134#5= /&..&24'> 1 protect their (& binaries, following ("#43 a standard model 5(1'8138 0&8#.? EF D'91:;4'>$ :&8# CEF C34>4'1. <4'13@ ➡ It is difficult to perform a static analysis ➡ Dynamic analysis by code monitoring (emulation, instrumentation, !"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8...)/&3 #1:" '#2 845(34<6(#8 <4'13@A !349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$ mercredi 8 juin 2011 B Exemple (4/5) A program generating codes • hostname packé avec Themida mercredi 8 juin 2011 Protections: Obfuscation ✓ Several possible implementations of a high level action mercredi 8 juin 2011 Protections: Obfuscation ✓ Several possible implementations of a high level action Two ways of writing into a file h=fopen(C:\windows\sys.dll);fwrite(«test»,h) h=createFile(C:\windows\sys.dll);writeFile(h,«test») mercredi 8 juin 2011 Malware detection methods in a tiny nutshell mercredi 8 juin 2011 Malware detection by string scanning • Signature is a regular expression denoting a sequence of bytes Worm.Y Your mac is now under our control ! • Signature : «Your * is now under our control Worm.Y Your PC is now under our control ! mercredi 8 juin 2011 Malware detection by string scanning Pros : • Accuracy: low rate of false positive ➡ programs which are not malware are not detected • Efficient : Fast string matching algorithm ➡ Karp & Rabin, Knuth, Morris & Pratt, Boyer & Moore Cons : • Signature are quasi-manually constructed • Signatures are not robust to malware protections ➡ Mutations ➡ Code obfuscations mercredi 8 juin 2011 Detection by integrity check • Identify a file using a hash function Files Hash function Hash numbers a numerical fingerprints b • Cons : • File systems are updated, so numerical fingerprints change • Difficult to maintain in practice • Files may change with the same numerical fingerprint (due to hash fct) mercredi 8 juin 2011 mercredi 8 juin 2011 Detection based on higher level of abstraction A problem is the absence of high level abstraction to structure and understand obfuscated codes. Related works -Preda, Christodorescu & al 2007: A semantics based approach to malware detection. -Chrisdorescu, Song & al 2007 : Semantics-Aware Malware detection -Bonfante, Kaczmarek and M. 2009 : Morphological analysis mercredi 8 juin 2011 Behavioral analysis and detection mercredi 8 juin 2011 Behavioral detection • Identification of a sequence of actions : • System calls or library calls • File systems interactions • Network interactions • Sequence of instructions mercredi 8 juin 2011 Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; char szFilename [ 2 0 4 8 ] ; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ; do { s p r i n t f ( szFilename , "%s \\% s " , d i r , f i n d D a t a . cFileName ) ; i f ( findData . d w Fi l e At t r i but es & FILE_ATTRIBUTE_DIRECTORY ) s c a n _ d i r ( szFilename ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u c t s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { ... } / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / char b u f f e r [ 1 0 2 4 ] ; GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; c o n s t char∗ s z D r i v e = b u f f e r ; w h i l e (∗ s z D r i v e ) { i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) scan_dir ( szDrive ) ; s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; } } void main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp ; const char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; } Example execution trace of library calls: mercredi 8 juin 2011 ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. FindNextFile... Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; char szFilename [ 2 0 4 8 ] ; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ; do { s p r i n t f ( szFilename , "%s \\% s " , d i r , f i n d D a t a . cFileName ) ; i f ( findData . d w Fi l e At t r i but es & FILE_ATTRIBUTE_DIRECTORY ) s c a n _ d i r ( szFilename ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u c t s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { ... } / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / char b u f f e r [ 1 0 2 4 ] ; GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; c o n s t char∗ s z D r i v e = b u f f e r ; w h i l e (∗ s z D r i v e ) { i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) scan_dir ( szDrive ) ; s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; } } void main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp ; const char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; } Example execution trace of library calls: mercredi 8 juin 2011 ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. FindNextFile... Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; char szFilename [ 2 0 4 8 ] ; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ; do { s p r i n t f ( szFilename , "%s \\% s " , d i r , f i n d D a t a . cFileName ) ; i f ( findData . d w Fi l e At t r i but es & FILE_ATTRIBUTE_DIRECTORY ) s c a n _ d i r ( szFilename ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u c t s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { ... } / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / char b u f f e r [ 1 0 2 4 ] ; GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; c o n s t char∗ s z D r i v e = b u f f e r ; w h i l e (∗ s z D r i v e ) { i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) scan_dir ( szDrive ) ; s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; } } void main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp ; const char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; } Example execution trace of library calls: mercredi 8 juin 2011 ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. FindNextFile... Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; char szFilename [ 2 0 4 8 ] ; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ; do { s p r i n t f ( szFilename , "%s \\% s " , d i r , f i n d D a t a . cFileName ) ; i f ( findData . d w Fi l e At t r i but es & FILE_ATTRIBUTE_DIRECTORY ) s c a n _ d i r ( szFilename ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u c t s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { ... } / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / char b u f f e r [ 1 0 2 4 ] ; GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; c o n s t char∗ s z D r i v e = b u f f e r ; w h i l e (∗ s z D r i v e ) { i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) scan_dir ( szDrive ) ; s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; } } void main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp ; const char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; } Example execution trace of library calls: mercredi 8 juin 2011 ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. FindNextFile... Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; char szFilename [ 2 0 4 8 ] ; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ; do { s p r i n t f ( szFilename , "%s \\% s " , d i r , f i n d D a t a . cFileName ) ; i f ( findData . d w Fi l e At t r i but es & FILE_ATTRIBUTE_DIRECTORY ) s c a n _ d i r ( szFilename ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u c t s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { ... } / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / char b u f f e r [ 1 0 2 4 ] ; GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; c o n s t char∗ s z D r i v e = b u f f e r ; w h i l e (∗ s z D r i v e ) { i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) scan_dir ( szDrive ) ; s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; } } void main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp ; const char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; } Example execution trace of library calls: mercredi 8 juin 2011 ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. FindNextFile... Allaple excerpt Allaple excerpt / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / hIcmp = I c m p C r e a t e F i l e ( ) ; f o r ( i n t i = 0 ; i < 2 ; ++ i ) IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , NULL, r e p l y , 128 , 1 0 0 0 ) ; IcmpCloseHandle ( hIcmp ) ; void s c a n _ d i r ( const char∗ d i r ) { HANDLE hFind ; Allaple[ 2excerpt char szFilename 048]; WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n dvoid F i r ss ctaFni_ldei r (( const szFilename char∗ d i,r ) &f { indData ) ; hFind ; i f ( hFind == HANDLE INVALID_HANDLE_VALUE) return ; char szFilename [ 2 0 4 8 ] ; do { WIN32_FIND_DATA f i n d D a t a ; s p r i n t f ( szFilename , "%s \\% s " , d i r , , "%s f i n dsDp rai nt at f .( szFilename cFileName ) ; \\% s " , d i r , " ∗.∗ " ) ; hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ; i f ( f i n d D a t ai f. d( hFind w F i l e==AINVALID_HANDLE_VALUE) ttributes return ; & FILE_ATTRIBUTE_DIRECTORY ) do { s p r i n t f ( szFilename , "%s \\% s " , d i r , s c a n _ d i r ( szFilename ); f i n d D a t a . cFileName ) ; else { . . . } i f ( f i n d D a t a . d w F i l e A t t r i b u t e s } while ( F i n d N e x t F l e ( hFind , &f i n d D a t a) ) ) ; & iFILE_ATTRIBUTE_DIRECTORY FindClose ( hFind ) s; c a n _ d i r ( szFilename ) ; } / ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ / /hIcmp ∗ Behavior = I c m p C r e apt eaFti ltee(r) n ; : N e t b i o s c o n n e c t i o n ∗/ f o r ( i n t i s= = 0 ; si o<c k 2 ;e t++ i) SOCKET ( AF_INET , SOCK_STREAM, 0 ) ; IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 , s t r u c t s o c k a d d r _ in s i n = NULL, r e p l y , 128 , 1 0 0 0 ) ; { AF_INET , i p) ;a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; IcmpCloseHandle ( hIcmp i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) / ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ / ! = SOCKET_ERROR) { SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ; s t r u. c. t. s o c k a d d r _ in s i n = { AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ; } i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) ) ! = SOCKET_ERROR) { / ∗. . Behavior p a t t e r n : scanning o f . } char buffer [1024]; l o c a l d r i v e s ∗/ GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ; else { . . . } } while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ; FindClose ( hFind ) ; void main ( i n t }argc , char∗∗ argv ) { HANDLE hIcmpvoid ; main ( i n t argc , char∗∗ argv ) { HANDLE hIcmp const char∗ icmpData = ; " Babcdef . . . " ; char r e p l y [ 1 2const 8 ] ; char∗ icmpData = " Babcdef . . . " ; char r e p l y [ 1 2 8 ] ; }} / ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ / cchar o n sbtu fchar ∗2 4s] ;z D r i v e = b u f f e r ; fer [10 G eht iLloeg i c(a∗l Dsrzi vD eS w r itvr ien g)s ( {s i z e o f ( b u f f e r ) , b u f f e r ) ; c o n s t char∗ s z D r i v e = b u f f e r ; i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED ) w h i l e (∗ s z D r i v e ) { c a n _ d i r ( s( zs zDDrrii vv ee) ) == ; DRIVE_FIXED ) i f (sGetDriveType z D r is v et r) ;l e n ( s z D r i v e ) + 1 ; s zs cDa rni_vdei r ( s+= s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ; }} Example execution trace of library calls: Example execution trace of library calls: ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. mercredi 8 juin 2011 FindNextFile... ...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile. 8 / 22 FindNextFile... Program Traces An execution is a sequence of configurations (µ0 , ν0 ) → . . . → (µn , νn ) → . . . A trace mercredi 8 juin 2011 c0 → . . . → cn → . . . projection Program Traces An execution is a sequence of configurations (µ0 , ν0 ) → . . . → (µn , νn ) → . . . A trace c0 → . . . → cn → . . . • A trace is a sequence of actions • Approximation by a regular language • A trace automaton is a finite approximation of a trace language mercredi 8 juin 2011 projection Program Traces An execution is a sequence of configurations (µ0 , ν0the )→ . . automaton . → (µn ,ofνanmachine, ) → . . one . may either use a In order to construct trace collection of captured traces or reconstruct the program machine code from its projection binary representation. In the first case, the automaton is built in such a way that the captured traces correspond to acpath in .the trace automaton. In the second c → . . . → → . . A trace 0 n case, the machine code of the program can be reconstructed using common techniques that combine static and dynamic analysis: it is then projected on the • A trace is a sequence of actions trace alphabet and the trace automaton is inferred from the code structure. • Approximation by a regular Example 4. Figure 1 shows language a trace automaton for the Allaple.A excerpt representing the ping ofisa aremote host and the scanning of local drives. • A trace automaton finite approximation of a trace language IcmpSendEcho GetDriveType GetLogicalDriveStrings Allaple.A mercredi 8 juin 2011 FindFirstFile FindNextFile GetDriveType FindFirstFile FindFirstFile FindNextFile FindNextFile Computing program traces 1.Dynamic analysis • Collect an execution trace • Monitor program interactions (sys calls, network calls, ...) • What is the detection coverage ? 2.Static analysis • A good approximation of a set of execution traces • Good detection coverage • But static analysis is difficult to perform mercredi 8 juin 2011 Behavior abstraction Goal : Provide a behavior analysis technique by expressing traces in term of high level, implementation-independant functionalities mercredi 8 juin 2011 Behavior abstraction Goal : Provide a behavior analysis technique by expressing traces in term of high level, implementation-independant functionalities h=fopen(C:\windows\sys.dll);fwrite(«test»,h) Write_System_File h=createFile(C:\windows\sys.dll);writeFile(h,«test») mercredi 8 juin 2011 Behavior abstraction Goal : Provide a behavior analysis technique by expressing traces in term of high level, implementation-independant functionalities Our works - Expressing set of traces by regular languages from static or dynamic analysis - Abstracting behavior patterns by string rewriting systems - Efficient analysis (quasi-linear time) -Detection of several abstract behaviors from a set of traces mercredi 8 juin 2011 Behavior abstraction Goal : Provide a behavior analysis technique by expressing traces in term of high level, implementation-independant functionalities Our works - Expressing set of traces by regular languages from static or dynamic analysis - Abstracting behavior patterns by string rewriting systems - Efficient analysis (quasi-linear time) -Detection of several abstract behaviors from a set of traces Related works - Martignoni et al. 2008: multi-layered abstraction on a single trace -Jacob et al., 2009: low-level functionalities, exponential-time detection mercredi 8 juin 2011 Behavior patterns mercredi 8 juin 2011 Behavior patterns A behavior pattern is a monadic String Rewriting System GetLogicalDriveString.GetDriveType.(FindFirstFile+FindFirstFileEx) → Scandrive mercredi 8 juin 2011 Behavior patterns A behavior pattern is a monadic String Rewriting System GetLogicalDriveString.GetDriveType.(FindFirstFile+FindFirstFileEx) → Scandrive • A behavior pattern B is a SRS : B1 → λ1 Bn → λn mercredi 8 juin 2011 Monadic String rewriting systems Rules of a monadic SRS are of the form α→β where |α| > |β| |β| ≤ 1 Theorem: The set of descendants of a regular language, by using monadic SRS is regular (Book & Otto) mercredi 8 juin 2011 Abstract trace language Abstract a trace language L by reducing it w.r.t. a behavior pattern B L →B . . . →B L mercredi 8 juin 2011 ↓ Abstract trace language Abstract a trace language L by reducing it w.r.t. a behavior pattern B L →B . . . →B L ↓ Theorem : Let B be a regular behavior pattern and L be a trace language. If L is regular then so is L↓. There is a quadradic-time procedure to compute L↓. mercredi 8 juin 2011 Abstract trace language Abstract a trace language L by reducing it w.r.t. a behavior pattern B L →B . . . →B L ↓ Theorem : Let B be a regular behavior pattern and L be a trace language. If L is regular then so is L↓. There is a quadradic-time procedure to compute L↓. GetLogicalDriveString.GetDriveType.FindFirstFile.FindNextFile.... Scandrive.FindNextFile... mercredi 8 juin 2011 Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n !a set of behavior patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| = |RBi |. Malicious behavior detection 1≤i≤n Then an " automaton # of size O (|A|) " recognizing # L (A) ↓C |Γ can be constructed 3 2 2 2 in time O |A| · |C| and space O |A| · |C| . •A The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES}, is depicted in Figure(signature) 3. malicious behavior is a regular behavior pattern B Allaple.A signature: PING SCAN_DRIVES PING.*.SCAN_DRIVE Fig. 3. Γ -abstract automaton for the Allaple.A excerpt 6 Application to Malware Detection Using the abstraction framework defined in Section 4, malware detection now consists in computing the abstract trace language of some machine and comparing it to a database of malicious behaviors defined on Γ . These malicious behaviors either describe generic behaviors, e.g. sending spam or logging keystrokes, or behaviors of specific malware. According to our abstraction formalism, malicious behaviors are sets of particular combinations of behavior patterns abstractions. Definition 7. A malicious behavior on Γ , or signature, is a language on Γ . mercredi 8 juin 2011 More specifically, a malicious behavior describes combinations of patterns, pos- Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n !a set of behavior patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| = |RBi |. Malicious behavior detection 1≤i≤n Then an " automaton # of size O (|A|) " recognizing # L (A) ↓C |Γ can be constructed 3 2 2 2 in time O |A| · |C| and space O |A| · |C| . •A The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES}, is depicted in Figure(signature) 3. malicious behavior is a regular behavior pattern B Allaple.A signature: PING SCAN_DRIVES PING.*.SCAN_DRIVE Fig. 3. Γ -abstract automaton for the Allaple.A excerpt the abstract trace language L↓ with B • Detection : comparing 6 Application to Malware Detection Using the abstraction framework defined in Section 4, malware detection now consists in computing the abstract trace language of some machine and comparing it to a database of malicious behaviors defined on Γ . These malicious behaviors either describe generic behaviors, e.g. sending spam or logging keystrokes, or behaviors of specific malware. According to our abstraction formalism, malicious behaviors are sets of particular combinations of behavior patterns abstractions. Definition 7. A malicious behavior on Γ , or signature, is a language on Γ . mercredi 8 juin 2011 More specifically, a malicious behavior describes combinations of patterns, pos- Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n !a set of behavior patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| = |RBi |. Malicious behavior detection 1≤i≤n Then an " automaton # of size O (|A|) " recognizing # L (A) ↓C |Γ can be constructed 3 2 2 2 in time O |A| · |C| and space O |A| · |C| . •A The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES}, is depicted in Figure(signature) 3. malicious behavior is a regular behavior pattern B Allaple.A signature: PING SCAN_DRIVES PING.*.SCAN_DRIVE Fig. 3. Γ -abstract automaton for the Allaple.A excerpt the abstract trace language L↓ with B • Detection : comparing 6 Application to Malware Detection Theorem : Let B a regular behavior pattern and L↓ an abstract trace Using the abstraction framework defined in Section 4, malware detection now language. consists in computing the abstract trace language of some machine and comparThere ing is ait linear-time procedure deciding L↓ isoninfected by B. behavto a database of malicious behaviors defined Γ . These malicious iors either describe generic behaviors, e.g. sending spam or logging keystrokes, or behaviors of specific malware. According to our abstraction formalism, malicious behaviors are sets of particular combinations of behavior patterns abstractions. Definition 7. A malicious behavior on Γ , or signature, is a language on Γ . mercredi 8 juin 2011 More specifically, a malicious behavior describes combinations of patterns, pos- On going researches and conclusions mercredi 8 juin 2011 sitions.of FOLTL an extension of tion endto (x, β, y)),iswhere the first paramused socket, the second parameter is the sent data and th e Appendix A2) such that: Example 1. Let us consider the functionality of se eated socket and the second parameter ormula der the functionality of sending a ping. One way of realizing it consists in calling the third one is the target, the unique parameter of closesock a, thenfirst ϕ isparameter an FOLTL ; is , the offormula sendto the s itbeen gormula consists in calling the socket function with the parameter IPPROTO_ICMP descri and Y ⊆ X is a set of d parameter is the sent data and the is thearguments freed socket and constants α and β in Fd identify th Tracking ... er IPPROTO_ICMP describing the network protocol and, then, calling the sendto funct and unique ∀Y.ϕ are FOLTL offormulas, the parameter closesocket above parameters IPPROTO_ICMP and ICMP_ECHOREQ. ,uences calling the sendto function with the parameter ICMP_ECHOREQ describing the data to ≡constants ¬∃Y.¬ϕ.α and β in Fd identify the REQ describing the data to be sent. Between these two calls,the the socket should not be freed A ping may also be realized using function IcmpSend • A behavior pattern is a First-order LTL (Linear temporal logic) formula OTO_ICMP and ICMP_ECHOREQ. for ϕ ∧ X (� U ϕ ). 2 .e socket . .) 1∈should not be freed . parameter This by the FOLTL formula: ϕ1 = ∃x, y. socket Echo, whose represents the ping target. This corr ealized the function formula using is closed when it IcmpSendhas no is described way as rmula: ∃x, y. socket (x, α) ∧ (¬closesocket (x) U sendto (x, β, y)), where the firs 1 = iable is ϕbound by a quantifier. represents the ping target. This corresponds to the FOLTL formula: ϕ = ∃x. IcmpSendEcho (x 2 eter of socket is the created socket and the second pa to (x, β,ϕy)), where the first paramfrmula: there variables of sort Data and σ ∈ 2 = ∃x. IcmpSendEcho (x). Hence, the ping functionality may be described by th is the network protocol, the first parameter of send d socket and the second parameter tution over may Y . The nctionality be application described of by the used socket, the second parameter is the sent data FOLTL formula: ϕ = ϕ ∨ ϕ . es=naturally first parameter of sendto is the ping 1 2 defined by the formula ∨ ϕ2 . cesϕ1of arameter is the issent e x in ϕ which in Ydata hasand beenthe third one is the target, the unique parameter of clos isathebehavior freed socketpattern and constants α and in Ftrac ehavior patternWe as of the setdefine of traces ets of parameter d ide then as the setβ of unique closesocket IPPROTO_ICMP and ICMP_ECHORE ality, i.e., ascarrying the set validating ntified is validated onβinfinite sequences nstants α and in of Fdtraces identify the above parameters out its functionality, i.e., as the set of traces validatin A ping may also be realized using the function Icm the functionality. s, denoted by ξ = (ξ , ξ , . . .) ∈ O_ICMP ICMP_ECHOREQ. 0 1 describing the functionality. s ξt =andthe formula Echo, whose parameter represents the ping target. Th s ϕ) is defined in the same way as zed using theisfunction vior pattern a set ofIcmpSendtraces B sponds ⊆ ted by to the FOLTL formula: ϕ2 = ∃x. IcmpSendE additional rule: ξ |= ∃Y.ϕ iff there presents ping target. correa closedthe FOLTL formulaThis ϕ1.on = Hence, the Definition AAPbehavior pattern is a set of traces B ping functionality may be described ubst that ξ) |= ϕσ. la:∈YϕT2such = ∃x. IcmpSendEcho (x). FOLTL {t (F | t |= ϕ}). validating Trace T Σ (F a closed FOLTL formula ϕ on AP formula: ϕ = ϕ ∨ ϕ . ping 1 2 Trace Σ mula is may validated over tracesbyofthe tomata onality be described T (F , X): B = {t ∈ T (F ) | t |= ϕ} . sequences of singleton sets of Action Σ Trace Σ We then define a behavior pattern as the set o ϕETECTION ∨ ϕ . dix A3 P ROBLEM 1 2 trace t = a0 · · · an is identified carrying out its functionality, i.e., as the set of traces v tomata oal is to be atomic ableastothe detect, inofaξtgiven vior pattern set traces of sets of predicates = set the formula describing the functionality. mercredi 8 juin 2011 sitions.of FOLTL an extension of tion endto (x, β, y)),iswhere the first paramabove parameters IPPROTO_ICMP and ICMP_ECHOREQ. used socket, the second parameter is the sent data and th e Appendix A2) such that: Example 1. Let us consider the functionality of se nces eated socket and the second parameter ormula der the functionality of also sending a ping. Oneusing A ping may be realized the function IcmpSendway of realizing it consists in calling the third one is the target, the unique parameter of closesock a, then ϕ is an FOLTL formula ; ,) the first parameter of sendto is the ∈ s itbeen gormula consists in calling the socket Echo, whose parameter represents target. This correfunction with the the ping parameter IPPROTO_ICMP descri and Y ⊆ X is a set of dy parameter is the sent data and the is thearguments freed socket and constants α and β in Fd identify th asIPPROTO_ICMP Tracking ... er describing the network ϕ protocol and, then, calling the sendto funct and unique ∀Y.ϕ are FOLTL formulas, sponds to the FOLTL formula: = ∃x. IcmpSendEcho (x). the parameter of closesocket 2 here above parameters IPPROTO_ICMP and ICMP_ECHOREQ. ,uences calling the sendto function with the parameter ICMP_ECHOREQ describing ≡constants ¬∃Y.¬ϕ.α and β in identify the Hence, theFdping functionality may be described by the the data to REQ describing the data to be sent. Between these two calls,the the socket should not be freed A ping may also be realized using function IcmpSend • A behavior pattern is a First-order LTL (Linear temporal logic) formula OTO_ICMP and ICMP_ECHOREQ. for ϕ ∧ X (� U ϕ ). 2 .e socket .of .) 1∈should FOLTLnotformula: ϕ = ϕ1 ∨ ϕ ping 2 .the FOLTL formula: ϕ = ∃x, y. socket be freed . This is described by 1 Echo, whose represents the ping target. This corr ealized the function formula using is closed when it IcmpSendhasparameter no way as (¬closesocket (x) Uassendto (x, β,of y)),traces where the firs rmula: = ∃x, y. socket (x, α) ∧ 1 We siable of is ϕbound then define a behavior pattern the set by a quantifier. represents the ping target. This corresponds to the FOLTL formula: ϕ = ∃x. IcmpSendEcho (x 2 eter of socket is the created socket and the second pa to (x, β, y)), where the first paramfrmula: there variables of sort Data and σ ∈ (x). fied ϕcarrying 2 = ∃x. IcmpSendEcho out its functionality, i.e., as the set of traces validating Hence, the ping functionality may be described by th is the network protocol, the first parameter of send d socket and the second parameter tution over Yformula . The of nctionality may be application described by the the describing the functionality. t = used socket, the second parameter is the sent data FOLTL formula: ϕ = ϕ ∨ ϕ . es=naturally first parameter of sendto is the ping 1 2 defined by the formula ϕ1of ∨ ϕ2 . ces darameter by third one is the target, the unique parameter of clos is the issent data and the e x in ϕDefinition which in Y1. has been A behavior pattern is a set of traces B ⊆ isathebehavior freed socketpattern and constants α and in Ftrac ehavior pattern assatisfy the set of traces ets d ide We then define as the setβ of •ofBad traces a FO-LTL formula unique parameter of closesocket T (F ) validating a closed FOLTL formula ϕ on AP = Trace Σ above parameters IPPROTO_ICMP and ICMP_ECHORE ality, i.e., as the set of traces validating mata ntified is validated onβinfinite sequences nstants α and in Fdout identify the carrying its functionality, i.e., as the set of traces validatin T (F , X): B = {t T ) t |= ϕ} . A∈ping may (F also be| realized using the function Icm Action Σ Trace Σ the functionality. s, denoted by ξ = (ξ , ξ , . . .) ∈ O_ICMP ICMP_ECHOREQ. 0 1 describing the functionality. s A3 ξt =andthe formula Echo, whose parameter represents the ping target. Th s ϕ) is defined in the same way as zed using theisfunction mata vior pattern a set ofIcmpSendtraces B sponds ⊆ ted by to the FOLTL formula: ϕ2 = ∃x. IcmpSendE additional rule: ξ |= ∃Y.ϕ iff there Dbehavior ETECTION P ROBLEM presents ping target. correa closedthe FOLTL formulaThis ϕ1.IV. on = Hence, the Definition AAP pattern is a set of traces B ducping functionality may be described ubst that ξ)abstraction |= ϕσ. •such Behavior is made by term rewriting systems Y la: ϕ = ∃x. IcmpSendEcho (x). {t ∈ T (F | t |= ϕ} . 2 Trace Σ T (F ) validating a closed FOLTL formula ϕ on AP FOLTL formula: ϕ = ϕ ∨ ϕ . tree As said before, our goal is to be able to detect, in a given set ping 1 2 Trace Σ mula is may validated over tracesbyofthe tomata onality be described d sequences by of traces, some predefined behavior composed of| combinations T (F , X): B = {t ∈ T (F ) t |= ϕ} . of singleton sets of Action Σ Trace Σ We then define a behavior pattern as the set o ϕETECTION ∨ ϕ . dix A3 • Linear detection algorithms based on tree-automata P ROBLEM 1 2 trace t of = high-level a0 · · · an is identified functionalities. this, we associate to set each carryingFor out its functionality, i.e., as the of traces v tomata oal is to be ableastothe detect, inofaan set vior pattern set traces of sets of atomic predicates ξtgiven =abstract the formula describing the functionality. behavior pattern symbol λ taken in the alphabet mercredi 8 juin 2011 information leak abstract behavior is then defined by: language, a fragment o information leak abstract behavior is then defined by: data variables, whose M := ∃x. λsteal (x) ∧ ¬λinval M (x):=U∃x. λleak (x) . λsteal (x) ∧ ¬λinvalsubset. (x) U λleak (x) . Welike first representsm th Byalooking at several malware samples, like smssamples, By looking at keyloggers, several malware keyloggers, A keylogger or sms message leaking app cess. information For this, westealing use a message leaking applications ormessage personal information stealing leaking applications or personal analysis (see [21] information mobile leak applications, abstract behavior is following then defined by: langua mobile applications, we consider following definitions o we consider the definitions of thestatic 7 the three behavior patterns involved:traces is enforced by l the three behavior patterns involved: data v 1 calls, an approximatio 1 • λsteal (x) describes a keystroke capture functionality • λ (x) describes a keystroke capture functionality M := ∃x.steal λsteal (x) ∧language, ¬λinval (x) U λmodal (x) . to the leak mu-calculus information leak abstract behavior is then defined by: aand, fragment of the with behavi subset on Android mobile phones, extended the abstract retrieving of the and, on Android mobile phones, the retrieving of the data variables, whose FOLTL logic used in this paper is to a regul IMEI number: shortcomings M := ∃x. λsteal (x) ∧ ¬λinval (x)IMEI U λleak (x) . number: We subset.samples, like keyloggers, By looking at several malware sms of conditional branch λsteal (x) := GetAsyncKeyState(x)∨ We first represent the program set of traces as a CADP proλsteallike (x)keyloggers, := GetAsyncKeyState(x)∨ By looking at several malware samples, sms result in falseINPUT))∨ positive cess. F (RegisterDev(KBD, SINK)⊙ GetInputData(x, message leaking applications or personal information stealing cess. For this, we use a program control flow graph obtained by message leaking applications or personal information stealing SINK)⊙ GetInputData(x, INPUT))∨ (RegisterDev(KBD, is obfuscated. And sec (∃y. SetW indowsHookEx(y, WH_KEYBOARD_LL)∧ static analysis (see [21] and [23], [24]). Regularity of the set of Network send functionality mobile applications, we consider the following definitions of static (∃y. SetW indowsHookEx(y, WH_KEYBOARD_LL)∧ Keystroke or IMEI capture mobile applications, we consider the following definitions of ¬U nhookW indowsHookEx(y)U HookCalled(y, x))∨ during dataflow traces is enforced by limiting recursion and inlining function analys the three behavior patterns involved: ¬U nhookW indowsHookEx(y)U HookCalled(y, x))∨ not significan traces ∃y.T elephonyM anager_getDeviceId(x, y). calls, an approximation that can be deemedthis safe does with respect 1 involved: three behavior • λsteal (x) the describes a keystroke capture patterns functionality invalidated ∃y.T elephonyM anager_getDeviceId(x, as two expressed behaviors to y). detect. Note thatNow, there are and, on Android mobile phones, the retrieving of the to the abstract • λleak (x) describes a network send functionality unde calls, 1 leak shortcomings to regular approximation. First, approximation formation abstra describes a keystroke capture functionality IMEI number: • λsteal (x) or Android: • λleak (x) describes a network Windows send functionality under of conditional branches by nondeterministic branches may into two steps: abstrac to the � � λsteal (x) := GetAsyncKeyState(x)∨ Windows or Android: λ (x) := ∃y, z. sendto(z, x, y)∨ and, on Android mobileresultphones, the retrieving of the in false positives, code ≤2 leak especially when the program � Rλ R (L) an (RegisterDev(KBD, SINK)⊙ GetInputData(x, INPUT))∨ is obfuscated.∃y, z. (connect (z, y) ∧ ¬close(z) U send(z, x))∨ shortc And second, failure to identify data correlations λleak (x) := ∃y, z. sendto(z, x, y)∨ trace matches the abst IMEI number: (∃y. SetW indowsHookEx(y, WH_KEYBOARD_LL)∧ during dataflow analysis can result in false negatives. However, ∃c, s. HttpU RLConnection_getOutputStream(s, c)∧ ∃y, z. (connect ∧ ¬close(z) U send(z, x))∨ So, we can simula ¬U nhookW indowsHookEx(y)U HookCalled(y, x))∨(z, y) of co this does¬OutputStream_close(s)U not significantly impact our detection results. OutputStream_write(s, x). s. HttpU y). RLConnection_getOutputStream(s, c)∧ delegate verificatio λstealanager_getDeviceId(x, (x) :=∃c,GetAsyncKeyState(x)∨ ∃y.T elephonyM Now, as expressed in Theorem 3, detection of the the inresult ¬OutputStream_close(s)U OutputStream_write(s, x). • leak λinvalabstract (x) describes theM overwriting or of x:the this, wefreeing represent formation behavior can be broken down • λleak (x) describes a network send functionalitySINK)⊙ under (RegisterDev(KBD, GetInputData(x, INPUT))∨ aLsystem ofy)∨ communi into two steps: abstracting the set of traces by computing is obf � ≤2λinval � (x) := f ree(x) ∨ ∃y. sprintf (x, Windows or Android: • λinval (x) describes the overwriting or freeing of x: 0 � Rλ R (L) and then verifying whether abstracted withan a .particular gate o (∃y. SetW indowsHookEx(y, WH_KEYBOARD_LL)∧ GetInputData(x, INPUT) ∨ .. λleak (x) := ∃y, z. sendto(z, x, y)∨ during trace ∨ matches the abstract behavior formula.library calls. Then, co λinval (x) := f ree(x) ∃y. sprintf 0 (x, y)∨ ∃y, z. (connect¬U (z, y)nhookW ∧ ¬close(z) UindowsHookEx(y)U send(z, x))∨ HookCalled(y, x))∨ captured is usuallystep not leaked in itsand raw form So, Finally, weINPUT) canthe simulate abstraction in CADP GetInputData(x, ∨ . . . thedata synchronization withda this so the we take into account of this data ∃c, s. HttpU RLConnection_getOutputStream(s, c)∧ delegate verification step totransformations the evaluator4 module. For via the the transducer realizing a dependency Finally, the capturedx). data is usually not leaked in its raw(x, form, elephonyM anager_getDeviceId(x, ¬OutputStream_close(s)U∃y.T OutputStream_write(s, behavior pattern λdepends denotes this, we represent the set of traces Ly)y). ofwhich a given program by Now � R is rational λ in representation of xofoncommunicating y. For instance, x may be a string oa a system processes expressed LOTOS, so we take into account transformations of this data via the • λinval (x) describes the overwriting or freeing of x: with ay, particular gate communications correspond to in CA or x may beonanwhich encryption or an synchronization encoding of y: forma behavior pattern λ (x, y) which denotes a dependency depends • λ (x) describes a network send functionality under inval inval inval mercredi 8 juin λ2011 leak (x) := f ree(x) ∨ ∃y. sprintf (x, y)∨ each bymalware library calls. Then, computation of R≤2 (L) isFor performed Tool chains C source Promela Model Checker Behavior FO-LTL mercredi 8 juin 2011 Related work Kinder & al, Detecting malicious code by model checking CHAPITRE 2. ANALYSE COMPORTEMENTALE STATIQUE 29 Ainsi, dans cette section, nous prenons l’exemple d’un programme capturant les caractères entrés au clavier (afin d’en extraire notamment mots de passe, informations bancaires et autres données sensibles). L’analyse sous IDA de sa forme compilée permet de déterminer la structure de la pile et le type des variables la composant. Par simplicité, nous travaillons sur son code source : A C Keylogger from EADS 1 2 3 4 5 LRESULT WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam) { RAWINPUTDEVICE rid; RAWINPUT *buffer; UINT dwSize; USHORT uKey; 6 switch(msg) { case WM_CREATE: /* Creation de la fenetre principale */ /* Initialisation de la capture du clavier */ rid.usUsagePage = 0x01; rid.usUsage = 0x06; rid.dwFlags = RIDEV_INPUTSINK; rid.hwndTarget = hwnd; RegisterRawInputDevices(&rid, 1, sizeof(RAWINPUTDEVICE)); break; 7 8 9 10 11 12 13 14 15 16 case WM_INPUT: /* Evenement clavier, souris, etc. */ /* Quelle taille pour buffer ? */ GetRawInputData( (HRAWINPUT) lParam, RID_INPUT, NULL, &dwSize, sizeof(RAWINPUTHEADER) ); buffer = (RAWINPUT*) malloc(dwSize); /* Recuperer dans buffer les donnees capturees */ if (!GetRawInputData( (HRAWINPUT) lParam, RID_INPUT, buffer, &dwSize, sizeof(RAWINPUTHEADER) )) break; if (buffer->header.dwType == RIM_TYPEKEYBOARD && buffer->data.keyboard.Message == WM_KEYDOWN) { printf("%c\n", buffer->data.keyboard.VKey); } free(buffer); break; } /* ... */ 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 mercredi 8 juin 2011 } Ce code contient sept variables : hwnd, msg, wParam, lParam, rid, buffer et dwSize. Les types HWND et HRAWINPUT représentent des entiers. Les types WPARAM et LPARAM sont interprétés différement selon le contexte : ici, seule – Pour un appel malloc(size), malloc prend 1 argument décrivant les données suivantes : la valeur de retour. – Pour un appel printf(format, arg1), printf1 prend 2 arguments décrivant les données suivantes : format, arg1. Enfin, on indexe la représentation abstraite par des symboles de Fd : hwnd, main_hwnd, msg, rid_usU sageP age, rid_usU sage, . . . Comme dans le cas de la fuite de SMS, dans la section précédente, on peut construire un modèle Promela du programme : Promela translation mtype = { CALL_COPY, CALL_RegisterRawInputDevices, CALL_GetRawInputData, CALL_malloc, CALL_printf1, CALL_free } chan lib_call = [0] of {mtype, int, int, int, int, int, int}; proctype lib_loop() { xr lib_call; do :: lib_call ? _,_,_,_,_,_,_; od; } init { run lib_loop(); run main(); } #define COPY(v1, v2) \ lib_call ! CALL_COPY, v1, v2; #define RegisterRawInputDevices(pRID_usUsagePage, pRID_usUsage, pRID_dwFlags,\ pRID_hwndTarget) \ lib_call ! CALL_RegisterRawInputDevices, pRID_usUsagePage, pRID_usUsage,\ pRID_dwFlags, pRID_hwndTarget; /* ... */ mtype = {hwnd, msg, wParam, lParam, rid_usUsagePage, rid_usUsage, rid_dwFlags, rid_hwndTarget, buffer, malloc10_header_dwType, malloc10_header_hDevice, malloc10_header_wParam, malloc10_keyboard_VKey, malloc10_keyboard_Message, dwSize, main_hwnd, mercredi 8 juin 2011 34 CHAPITRE 2. ANALYSE COMPORTEMENTALE STATIQUE Behavior graph main S21 copy(hwnd,main_hwnd) S19 copy(rid_hwndTarget,hwnd) S24 GetRawInputData(_0,_0,_0,_0,_0) -end- S5 S3 S0 malloc(buffer) S6 GetRawInputData(_0,malloc10_hdr_dwType,malloc10_hdr_hDevice,malloc10_kbd_VKey,malloc10_kbd_Message) S9 RegisterRawInputDevices(_1,_6,RIDEV_INPUTSINK,rid_hwndTarget) S13 printf1(FMT,malloc10_kbd_VKey) S15 free(buffer) S18 Figure 2.7: Automate de traces pour le keylogger en Section 2.4.4. mercredi 8 juin 2011 Malicious behavior detection • Detection of malicious behaviors • Abstraction provides a high level notion of signature which is robust with respect to some functional obfuscation methods • Behavior analysis from a set of traces which comes from static or dynamic analysis • Detections algorithms are efficient based on (word/tree) automata techniques • Tests with Allaple, Virus, Agent, Rbot, Afcore and Mimail mercredi 8 juin 2011 High Security Lab @ Nancy lhs.loria.fr Telescope & honeypots In vitro experiment clusters mercredi 8 juin 2011 Thanks ! mercredi 8 juin 2011