Behavior abstraction in Malware analysis

Transcription

Behavior abstraction in Malware analysis
Behavior abstraction in Malware analysis
Jean-Yves Marion
LORIA
INRIA
mercredi 8 juin 2011
Joint works with
Philippe Beaucamps
Isabelle Gnaedig
Daniel Reynaud (Berkeley U.)
mercredi 8 juin 2011
What is a malware ?
mercredi 8 juin 2011
What is a malware ?
• A malware is a program which has malicious intentions
mercredi 8 juin 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a botnet ...
mercredi 8 juin 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a botnet ...
• Giving a mathematical definition is difficult
mercredi 8 juin 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a botnet ...
• Giving a mathematical definition is difficult
✴ So how to detect a malware ?
mercredi 8 juin 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a botnet ...
• Giving a mathematical definition is difficult
✴ So how to detect a malware ?
✴ How to protect a system from a malware ?
mercredi 8 juin 2011
What
is a malware
urquoi
tracer
? (1/3)?
• A malware is a program which has malicious intentions
nition : l’analyse binaire, c’est
e l’analyse
deisprogramme
• A malware
a virus, a worm, a botnet ...
ù le programme est inconnu
• Giving a mathematical definition is difficult
⇒ on a juste un blob binaire
sons✴: So how to detect a malware ?
auts indirects
✴ How to protect a system from a malware ?
=⇒ flot de contrôle indécidable
ectures/écritures indirectes
=⇒ flot de données indécidable
ode auto-modifiant
=⇒ syntaxe indécidable
mercredi 8 juin 2011
Code protection
Detection is hard because malware are protected
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
mercredi 8 juin 2011
!"#$%&'(#)($*+,
Protections: Self-Modification
‡ - .&( &/ 01.213# /104.4#5 65# "&0#7018#
• A lot of malware families use home-made obfuscations, like packers to
91:;#35
93&(#:(
<4'134#5=
/&..&24'> 1
protect their (&
binaries,
following ("#43
a standard
model
5(1'8138 0&8#.? EF
D'91:;4'>$
:&8#
CEF
C34>4'1.
<4'13@
➡ It is difficult to perform a static analysis
➡ Dynamic
analysis by
code monitoring
(emulation, instrumentation,
‡ !"#
6'91:;4'>
:&8#
45 16(&01(4:1..@
0&84/4#8...)/&3
#1:" '#2 845(34<6(#8 <4'13@A
!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$
mercredi 8 juin 2011
B
Exemple
(4/5)
A program generating codes
• hostname packé avec Themida
mercredi 8 juin 2011
Protections: Obfuscation
✓ Several possible implementations of a high
level action
mercredi 8 juin 2011
Protections: Obfuscation
✓ Several possible implementations of a high
level action
Two ways of writing into a file
h=fopen(C:\windows\sys.dll);fwrite(«test»,h)
h=createFile(C:\windows\sys.dll);writeFile(h,«test»)
mercredi 8 juin 2011
Malware detection methods in a tiny nutshell
mercredi 8 juin 2011
Malware detection by string scanning
• Signature is a regular expression denoting a sequence of bytes
Worm.Y
Your mac is now under our control !
• Signature : «Your * is now under our control
Worm.Y
Your PC is now under our control !
mercredi 8 juin 2011
Malware detection by string scanning
Pros :
• Accuracy: low rate of false positive
➡ programs which are not malware are not detected
• Efficient : Fast string matching algorithm
➡ Karp & Rabin, Knuth, Morris & Pratt, Boyer & Moore
Cons :
• Signature are quasi-manually constructed
• Signatures are not robust to malware protections
➡ Mutations
➡ Code obfuscations
mercredi 8 juin 2011
Detection by integrity check
• Identify a file using a hash function
Files
Hash function
Hash numbers
a
numerical
fingerprints
b
• Cons :
• File systems are updated, so numerical fingerprints change
• Difficult to maintain in practice
• Files may change with the same numerical fingerprint (due to hash fct)
mercredi 8 juin 2011
mercredi 8 juin 2011
Detection based on higher level of abstraction
A problem is the absence of high level abstraction
to structure and understand obfuscated codes.
Related works
-Preda, Christodorescu & al 2007: A semantics based approach to malware
detection.
-Chrisdorescu, Song & al 2007 : Semantics-Aware Malware detection
-Bonfante, Kaczmarek and M. 2009 : Morphological analysis
mercredi 8 juin 2011
Behavioral analysis and detection
mercredi 8 juin 2011
Behavioral detection
• Identification of a sequence of actions :
• System calls or library calls
• File systems interactions
• Network interactions
• Sequence of instructions
mercredi 8 juin 2011
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
char szFilename [ 2 0 4 8 ] ;
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ;
do {
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
f i n d D a t a . cFileName ) ;
i f ( findData . d w Fi l e At t r i but es
& FILE_ATTRIBUTE_DIRECTORY )
s c a n _ d i r ( szFilename ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u c t s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
...
}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
char b u f f e r [ 1 0 2 4 ] ;
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
w h i l e (∗ s z D r i v e ) {
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
scan_dir ( szDrive ) ;
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}
}
void main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp ;
const char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}
Example execution trace of library calls:
mercredi 8 juin 2011
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
FindNextFile...
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
char szFilename [ 2 0 4 8 ] ;
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ;
do {
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
f i n d D a t a . cFileName ) ;
i f ( findData . d w Fi l e At t r i but es
& FILE_ATTRIBUTE_DIRECTORY )
s c a n _ d i r ( szFilename ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u c t s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
...
}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
char b u f f e r [ 1 0 2 4 ] ;
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
w h i l e (∗ s z D r i v e ) {
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
scan_dir ( szDrive ) ;
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}
}
void main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp ;
const char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}
Example execution trace of library calls:
mercredi 8 juin 2011
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
FindNextFile...
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
char szFilename [ 2 0 4 8 ] ;
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ;
do {
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
f i n d D a t a . cFileName ) ;
i f ( findData . d w Fi l e At t r i but es
& FILE_ATTRIBUTE_DIRECTORY )
s c a n _ d i r ( szFilename ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u c t s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
...
}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
char b u f f e r [ 1 0 2 4 ] ;
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
w h i l e (∗ s z D r i v e ) {
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
scan_dir ( szDrive ) ;
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}
}
void main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp ;
const char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}
Example execution trace of library calls:
mercredi 8 juin 2011
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
FindNextFile...
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
char szFilename [ 2 0 4 8 ] ;
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ;
do {
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
f i n d D a t a . cFileName ) ;
i f ( findData . d w Fi l e At t r i but es
& FILE_ATTRIBUTE_DIRECTORY )
s c a n _ d i r ( szFilename ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u c t s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
...
}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
char b u f f e r [ 1 0 2 4 ] ;
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
w h i l e (∗ s z D r i v e ) {
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
scan_dir ( szDrive ) ;
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}
}
void main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp ;
const char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}
Example execution trace of library calls:
mercredi 8 juin 2011
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
FindNextFile...
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
char szFilename [ 2 0 4 8 ] ;
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( hFind == INVALID_HANDLE_VALUE) r e t u r n ;
do {
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
f i n d D a t a . cFileName ) ;
i f ( findData . d w Fi l e At t r i but es
& FILE_ATTRIBUTE_DIRECTORY )
s c a n _ d i r ( szFilename ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u c t s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
...
}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
char b u f f e r [ 1 0 2 4 ] ;
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
w h i l e (∗ s z D r i v e ) {
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
scan_dir ( szDrive ) ;
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}
}
void main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp ;
const char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}
Example execution trace of library calls:
mercredi 8 juin 2011
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
FindNextFile...
Allaple excerpt
Allaple
excerpt
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
hIcmp = I c m p C r e a t e F i l e ( ) ;
f o r ( i n t i = 0 ; i < 2 ; ++ i )
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
NULL, r e p l y , 128 , 1 0 0 0 ) ;
IcmpCloseHandle ( hIcmp ) ;
void s c a n _ d i r ( const char∗ d i r ) {
HANDLE hFind ;
Allaple[ 2excerpt
char szFilename
048];
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n dvoid
F i r ss ctaFni_ldei r (( const
szFilename
char∗ d i,r ) &f
{ indData ) ;
hFind ;
i f ( hFind == HANDLE
INVALID_HANDLE_VALUE)
return ;
char szFilename [ 2 0 4 8 ] ;
do {
WIN32_FIND_DATA f i n d D a t a ;
s p r i n t f ( szFilename , "%s \\% s " , d i r ,
, "%s
f i n dsDp rai nt at f .( szFilename
cFileName
) ; \\% s " , d i r , " ∗.∗ " ) ;
hFind = F i n d F i r s t F i l e ( szFilename , &f i n d D a t a ) ;
i f ( f i n d D a t ai f. d( hFind
w F i l e==AINVALID_HANDLE_VALUE)
ttributes
return ;
& FILE_ATTRIBUTE_DIRECTORY
)
do {
s p r i n t f ( szFilename
, "%s \\% s " , d i r ,
s c a n _ d i r ( szFilename
);
f i n d D a t a . cFileName ) ;
else { . . . } i f ( f i n d D a t a . d w F i l e A t t r i b u t e s
} while ( F i n d N e x t F
l e ( hFind , &f i n d D a t a) ) ) ;
& iFILE_ATTRIBUTE_DIRECTORY
FindClose ( hFind ) s; c a n _ d i r ( szFilename ) ;
}
/ ∗ Behavior p a t t e r n : p i n g o f a remote h o s t ∗ /
/hIcmp
∗ Behavior
= I c m p C r e apt eaFti ltee(r) n
; : N e t b i o s c o n n e c t i o n ∗/
f o r ( i n t i s= =
0 ; si o<c k
2 ;e t++
i)
SOCKET
( AF_INET
, SOCK_STREAM, 0 ) ;
IcmpSendEcho ( hIcmp , i p a d d r , icmpData , 10 ,
s t r u c t s o c k a d d r _ in s i n =
NULL, r e p l y , 128 , 1 0 0 0 ) ;
{ AF_INET
, i p) ;a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
IcmpCloseHandle
( hIcmp
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
/ ∗ Behavior p a t t e r n : N e t b i o s c o n n e c t i o n ∗ /
! = SOCKET_ERROR) {
SOCKET s = s o c k e t ( AF_INET , SOCK_STREAM, 0 ) ;
s t r u. c. t. s o c k a d d r _ in s i n =
{ AF_INET , i p a d d r , htons ( 1 3 9 ) / ∗ N e t b i o s ∗ / } ;
}
i f ( connect ( s , (SOCKADDR∗)& s i n , s i z e o f ( s i n ) )
! = SOCKET_ERROR) {
/ ∗. . Behavior
p a t t e r n : scanning o f
.
}
char
buffer [1024];
l o c a l d r i v e s ∗/
GetLogicalDriveStrings ( sizeof ( buffer ) , buffer ) ;
else { . . . }
} while ( F i n d N e x t F i l e ( hFind , &f i n d D a t a ) ) ;
FindClose ( hFind ) ;
void main ( i n t }argc , char∗∗ argv ) {
HANDLE hIcmpvoid
;
main ( i n t argc , char∗∗ argv ) {
HANDLE hIcmp
const char∗ icmpData
= ; " Babcdef . . . " ;
char r e p l y [ 1 2const
8 ] ; char∗ icmpData = " Babcdef . . . " ;
char r e p l y [ 1 2 8 ] ;
}}
/ ∗ Behavior p a t t e r n : scanning o f l o c a l d r i v e s ∗ /
cchar
o n sbtu fchar
∗2 4s] ;z D r i v e = b u f f e r ;
fer [10
G eht iLloeg i c(a∗l Dsrzi vD
eS
w
r itvr ien g)s ( {s i z e o f ( b u f f e r ) , b u f f e r ) ;
c o n s t char∗ s z D r i v e = b u f f e r ;
i f ( GetDriveType ( s z D r i v e ) == DRIVE_FIXED )
w h i l e (∗ s z D r i v e ) {
c a n _ d i r ( s( zs zDDrrii vv ee) ) ==
; DRIVE_FIXED )
i f (sGetDriveType
z D r is
v et r) ;l e n ( s z D r i v e ) + 1 ;
s zs cDa rni_vdei r ( s+=
s z D r i v e += s t r l e n ( s z D r i v e ) + 1 ;
}}
Example execution trace of library calls:
Example execution
trace of library calls:
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
mercredi 8 juin 2011
FindNextFile...
...GetLogicalDriveStrings.GetDriveType.FindFirstFile.FindFirstFile.
8 / 22
FindNextFile...
Program Traces
An execution is a sequence of configurations
(µ0 , ν0 ) → . . . → (µn , νn ) → . . .
A trace
mercredi 8 juin 2011
c0 → . . . → cn → . . .
projection
Program Traces
An execution is a sequence of configurations
(µ0 , ν0 ) → . . . → (µn , νn ) → . . .
A trace
c0 → . . . → cn → . . .
• A trace is a sequence of actions
• Approximation by a regular language
• A trace automaton is a finite approximation of a trace language
mercredi 8 juin 2011
projection
Program Traces
An execution is a sequence of configurations
(µ0 , ν0the
)→
. . automaton
. → (µn ,ofνanmachine,
) → . . one
. may either use a
In order to construct
trace
collection of captured traces or reconstruct the program machine code
from its
projection
binary representation. In the first case, the automaton is built in such a way that
the captured traces
correspond
to acpath
in .the
trace automaton. In the second
c
→
.
.
.
→
→
.
.
A trace
0
n
case, the machine code of the program can be reconstructed using common
techniques
that combine
static and dynamic analysis: it is then projected on the
• A trace
is a sequence
of actions
trace alphabet and the trace automaton is inferred from the code structure.
• Approximation
by a regular
Example 4. Figure
1 shows language
a trace automaton for the Allaple.A excerpt representing
the ping ofisa aremote
host and the scanning
of local
drives.
• A trace
automaton
finite approximation
of a trace
language
IcmpSendEcho
GetDriveType
GetLogicalDriveStrings
Allaple.A
mercredi 8 juin 2011
FindFirstFile
FindNextFile
GetDriveType
FindFirstFile
FindFirstFile
FindNextFile
FindNextFile
Computing program traces
1.Dynamic analysis
• Collect an execution trace
• Monitor program interactions (sys calls, network calls, ...)
• What is the detection coverage ?
2.Static analysis
• A good approximation of a set of execution traces
• Good detection coverage
• But static analysis is difficult to perform
mercredi 8 juin 2011
Behavior abstraction
Goal : Provide a behavior analysis technique by expressing
traces in term of high level, implementation-independant
functionalities
mercredi 8 juin 2011
Behavior abstraction
Goal : Provide a behavior analysis technique by expressing
traces in term of high level, implementation-independant
functionalities
h=fopen(C:\windows\sys.dll);fwrite(«test»,h)
Write_System_File
h=createFile(C:\windows\sys.dll);writeFile(h,«test»)
mercredi 8 juin 2011
Behavior abstraction
Goal : Provide a behavior analysis technique by expressing traces in term of
high level, implementation-independant functionalities
Our works
- Expressing set of traces by regular languages from static or dynamic analysis
- Abstracting behavior patterns by string rewriting systems
- Efficient analysis (quasi-linear time)
-Detection of several abstract behaviors from a set of traces
mercredi 8 juin 2011
Behavior abstraction
Goal : Provide a behavior analysis technique by expressing traces in term of
high level, implementation-independant functionalities
Our works
- Expressing set of traces by regular languages from static or dynamic analysis
- Abstracting behavior patterns by string rewriting systems
- Efficient analysis (quasi-linear time)
-Detection of several abstract behaviors from a set of traces
Related works
- Martignoni et al. 2008: multi-layered abstraction on a single trace
-Jacob et al., 2009: low-level functionalities, exponential-time detection
mercredi 8 juin 2011
Behavior patterns
mercredi 8 juin 2011
Behavior patterns
A behavior pattern is a monadic String Rewriting System
GetLogicalDriveString.GetDriveType.(FindFirstFile+FindFirstFileEx)
→ Scandrive
mercredi 8 juin 2011
Behavior patterns
A behavior pattern is a monadic String Rewriting System
GetLogicalDriveString.GetDriveType.(FindFirstFile+FindFirstFileEx)
→ Scandrive
• A behavior pattern B is a SRS :
B1 → λ1
Bn → λn
mercredi 8 juin 2011
Monadic String rewriting systems
Rules of a monadic SRS are of the form
α→β
where
|α| > |β|
|β| ≤ 1
Theorem:
The set of descendants of a regular language,
by using monadic SRS is regular
(Book & Otto)
mercredi 8 juin 2011
Abstract trace language
Abstract a trace language L by reducing it w.r.t. a behavior pattern B
L →B . . . →B L
mercredi 8 juin 2011
↓
Abstract trace language
Abstract a trace language L by reducing it w.r.t. a behavior pattern B
L →B . . . →B L
↓
Theorem : Let B be a regular behavior pattern and L be
a trace language.
If L is regular then so is L↓.
There is a quadradic-time procedure to compute L↓.
mercredi 8 juin 2011
Abstract trace language
Abstract a trace language L by reducing it w.r.t. a behavior pattern B
L →B . . . →B L
↓
Theorem : Let B be a regular behavior pattern and L be
a trace language.
If L is regular then so is L↓.
There is a quadradic-time procedure to compute L↓.
GetLogicalDriveString.GetDriveType.FindFirstFile.FindNextFile....
Scandrive.FindNextFile...
mercredi 8 juin 2011
Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n
!a set of behavior
patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| =
|RBi |.
Malicious behavior detection
1≤i≤n
Then an
" automaton
# of size O (|A|)
" recognizing
# L (A) ↓C |Γ can be constructed
3
2
2
2
in time O |A| · |C| and space O |A| · |C| .
•A
The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES},
is depicted
in Figure(signature)
3.
malicious
behavior
is a regular behavior pattern B
Allaple.A signature:
PING
SCAN_DRIVES
PING.*.SCAN_DRIVE
Fig. 3. Γ -abstract automaton for the Allaple.A excerpt
6
Application to Malware Detection
Using the abstraction framework defined in Section 4, malware detection now
consists in computing the abstract trace language of some machine and comparing it to a database of malicious behaviors defined on Γ . These malicious behaviors either describe generic behaviors, e.g. sending spam or logging keystrokes, or
behaviors of specific malware. According to our abstraction formalism, malicious
behaviors are sets of particular combinations of behavior patterns abstractions.
Definition 7. A malicious behavior on Γ , or signature, is a language on Γ .
mercredi 8 juin 2011
More specifically, a malicious behavior describes combinations of patterns, pos-
Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n
!a set of behavior
patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| =
|RBi |.
Malicious behavior detection
1≤i≤n
Then an
" automaton
# of size O (|A|)
" recognizing
# L (A) ↓C |Γ can be constructed
3
2
2
2
in time O |A| · |C| and space O |A| · |C| .
•A
The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES},
is depicted
in Figure(signature)
3.
malicious
behavior
is a regular behavior pattern B
Allaple.A signature:
PING
SCAN_DRIVES
PING.*.SCAN_DRIVE
Fig. 3. Γ -abstract automaton for the Allaple.A excerpt
the abstract trace language L↓ with B
• Detection : comparing
6
Application to Malware Detection
Using the abstraction framework defined in Section 4, malware detection now
consists in computing the abstract trace language of some machine and comparing it to a database of malicious behaviors defined on Γ . These malicious behaviors either describe generic behaviors, e.g. sending spam or logging keystrokes, or
behaviors of specific malware. According to our abstraction formalism, malicious
behaviors are sets of particular combinations of behavior patterns abstractions.
Definition 7. A malicious behavior on Γ , or signature, is a language on Γ .
mercredi 8 juin 2011
More specifically, a malicious behavior describes combinations of patterns, pos-
Theorem 3. Let A be a trace automaton and C = {Bi }1≤i≤n
!a set of behavior
patterns recognized by regular SRSs {RBi }1≤i≤n . Let |C| =
|RBi |.
Malicious behavior detection
1≤i≤n
Then an
" automaton
# of size O (|A|)
" recognizing
# L (A) ↓C |Γ can be constructed
3
2
2
2
in time O |A| · |C| and space O |A| · |C| .
•A
The final abstraction of the Allaple.A excerpt, for Γ = {PING, SCAN_DRIVES},
is depicted
in Figure(signature)
3.
malicious
behavior
is a regular behavior pattern B
Allaple.A signature:
PING
SCAN_DRIVES
PING.*.SCAN_DRIVE
Fig. 3. Γ -abstract automaton for the Allaple.A excerpt
the abstract trace language L↓ with B
• Detection : comparing
6
Application to Malware Detection
Theorem : Let B a regular behavior pattern and L↓ an abstract trace
Using the abstraction framework defined in Section 4, malware detection now
language.
consists in computing the abstract trace language of some machine and comparThere ing
is ait linear-time
procedure
deciding
L↓ isoninfected
by B. behavto a database of
malicious behaviors
defined
Γ . These malicious
iors either describe generic behaviors, e.g. sending spam or logging keystrokes, or
behaviors of specific malware. According to our abstraction formalism, malicious
behaviors are sets of particular combinations of behavior patterns abstractions.
Definition 7. A malicious behavior on Γ , or signature, is a language on Γ .
mercredi 8 juin 2011
More specifically, a malicious behavior describes combinations of patterns, pos-
On going researches and conclusions
mercredi 8 juin 2011
sitions.of
FOLTL
an extension
of
tion
endto
(x,
β, y)),iswhere
the first paramused
socket,
the
second
parameter
is
the
sent
data
and
th
e
Appendix
A2)
such
that:
Example
1.
Let
us
consider
the
functionality
of
se
eated
socket
and
the
second
parameter
ormula
der the functionality of sending a ping. One way of realizing it consists in calling the
third
one
is
the
target,
the
unique
parameter
of
closesock
a,
thenfirst
ϕ isparameter
an FOLTL
; is
, the
offormula
sendto
the
s itbeen
gormula
consists
in
calling
the
socket
function with the parameter IPPROTO_ICMP descri
and
Y
⊆
X
is
a
set
of
d parameter
is
the
sent
data
and
the
is thearguments
freed
socket
and constants α and β in Fd identify th
Tracking
...
er
IPPROTO_ICMP
describing
the
network protocol and, then, calling the sendto funct
and unique
∀Y.ϕ are
FOLTL offormulas,
the
parameter
closesocket
above
parameters
IPPROTO_ICMP
and
ICMP_ECHOREQ.
,uences
calling
the
sendto
function
with
the
parameter
ICMP_ECHOREQ
describing the data to
≡constants
¬∃Y.¬ϕ.α and β in Fd identify the
REQ
describing
the
data
to
be
sent.
Between
these two
calls,the
the
socket
should
not be freed
A
ping
may
also
be
realized
using
function
IcmpSend
•
A
behavior
pattern
is
a
First-order
LTL
(Linear
temporal
logic)
formula
OTO_ICMP
and
ICMP_ECHOREQ.
for
ϕ
∧
X
(�
U
ϕ
).
2
.e socket
. .) 1∈should not be
freed
. parameter
This
by the FOLTL
formula:
ϕ1 = ∃x,
y. socket
Echo,
whose
represents
the
ping
target.
This
corr
ealized
the function
formula using
is closed
when
it IcmpSendhas
no is described
way
as
rmula:
∃x,
y.
socket
(x, α)
∧ (¬closesocket (x) U sendto (x, β, y)), where the firs
1 =
iable
is ϕbound
by
a
quantifier.
represents
the
ping
target.
This
corresponds
to
the
FOLTL
formula:
ϕ
=
∃x.
IcmpSendEcho
(x
2
eter
of
socket
is
the
created
socket
and
the
second
pa
to
(x, β,ϕy)),
where
the
first
paramfrmula:
there
variables
of
sort
Data
and
σ
∈
2 = ∃x. IcmpSendEcho (x).
Hence,
the
ping
functionality
may
be
described
by
th
is
the
network
protocol,
the
first
parameter
of
send
d
socket
and
the
second
parameter
tution over may
Y . The
nctionality
be application
described of
by the
used
socket,
the
second
parameter is the sent data
FOLTL
formula:
ϕ
=
ϕ
∨
ϕ
.
es=naturally
first
parameter
of
sendto
is
the
ping
1
2
defined
by
the
formula
∨ ϕ2 .
cesϕ1of
arameter
is the issent
e x in ϕ which
in Ydata
hasand
beenthe third one is the target, the unique parameter of clos
isathebehavior
freed socketpattern
and constants
α and
in Ftrac
ehavior
patternWe
as of
the
setdefine
of traces
ets
of parameter
d ide
then
as the
setβ of
unique
closesocket
IPPROTO_ICMP and ICMP_ECHORE
ality,
i.e.,
ascarrying
the
set
validating
ntified
is validated
onβinfinite
sequences
nstants
α and
in of
Fdtraces
identify
the above parameters
out
its
functionality,
i.e., as
the set of traces validatin
A ping may also be realized using the function Icm
the
functionality.
s,
denoted
by
ξ
=
(ξ
,
ξ
,
.
.
.)
∈
O_ICMP
ICMP_ECHOREQ.
0 1 describing the functionality.
s ξt =andthe
formula
Echo, whose parameter represents the ping target. Th
s
ϕ)
is
defined
in
the
same
way
as
zed
using
theisfunction
vior
pattern
a set ofIcmpSendtraces B sponds
⊆
ted
by
to the FOLTL formula: ϕ2 = ∃x. IcmpSendE
additional
rule:
ξ
|=
∃Y.ϕ
iff
there
presents
ping target.
correa closedthe
FOLTL
formulaThis
ϕ1.on
= Hence, the
Definition
AAPbehavior
pattern
is a set of
traces
B
ping
functionality
may
be
described
ubst
that
ξ) |=
ϕσ.
la:∈YϕT2such
=
∃x.
IcmpSendEcho
(x). FOLTL
{t
(F
|
t
|=
ϕ}). validating
Trace T Σ
(F
a
closed
FOLTL
formula
ϕ
on
AP
formula:
ϕ
=
ϕ
∨
ϕ
.
ping
1
2
Trace
Σ
mula
is may
validated
over tracesbyofthe
tomata
onality
be described
T
(F
,
X):
B
=
{t
∈
T
(F
)
|
t
|=
ϕ}
.
sequences
of
singleton
sets
of
Action
Σ
Trace
Σ
We
then
define
a
behavior
pattern
as the set o
ϕETECTION
∨
ϕ
.
dix
A3
P
ROBLEM
1
2
trace t = a0 · · · an is identified carrying out its functionality, i.e., as the set of traces v
tomata
oal
is to
be atomic
ableastothe
detect,
inofaξtgiven
vior
pattern
set
traces
of sets
of
predicates
= set
the formula describing the functionality.
mercredi 8 juin 2011
sitions.of
FOLTL
an extension
of
tion
endto
(x,
β, y)),iswhere
the first paramabove
parameters
IPPROTO_ICMP
and
ICMP_ECHOREQ.
used
socket,
the
second
parameter
is
the
sent
data
and
th
e
Appendix
A2)
such
that:
Example
1.
Let
us
consider
the
functionality
of
se
nces
eated
socket
and
the
second
parameter
ormula
der the functionality
of also
sending
a ping. Oneusing
A
ping
may
be
realized
the
function
IcmpSendway
of
realizing
it
consists
in calling the
third
one
is
the
target,
the
unique
parameter
of
closesock
a,
then
ϕ
is
an
FOLTL
formula
;
,) the
first
parameter
of
sendto
is
the
∈
s itbeen
gormula
consists
in
calling
the
socket
Echo,
whose
parameter
represents
target.
This
correfunction
with the
the ping
parameter
IPPROTO_ICMP
descri
and
Y
⊆
X
is
a
set
of
dy parameter
is
the
sent
data
and
the
is thearguments
freed
socket
and constants α and β in Fd identify th
asIPPROTO_ICMP
Tracking
...
er
describing
the
network ϕ
protocol
and,
then, calling the sendto
funct
and unique
∀Y.ϕ
are
FOLTL
formulas,
sponds
to
the
FOLTL
formula:
=
∃x.
IcmpSendEcho
(x).
the
parameter
of
closesocket
2
here
above
parameters
IPPROTO_ICMP
and
ICMP_ECHOREQ.
,uences
calling
the
sendto
function
with
the
parameter
ICMP_ECHOREQ
describing
≡constants
¬∃Y.¬ϕ.α
and β in
identify
the
Hence,
theFdping
functionality
may be described
by the
the data to
REQ
describing
the
data
to
be
sent.
Between
these two
calls,the
the
socket
should
not be freed
A
ping
may
also
be
realized
using
function
IcmpSend
•
A
behavior
pattern
is
a
First-order
LTL
(Linear
temporal
logic)
formula
OTO_ICMP
and
ICMP_ECHOREQ.
for
ϕ
∧
X
(�
U
ϕ
).
2
.e socket
.of
.) 1∈should
FOLTLnotformula:
ϕ
= ϕ1 ∨ ϕ
ping
2 .the FOLTL formula: ϕ = ∃x, y. socket
be
freed
.
This
is
described
by
1
Echo,
whose
represents
the
ping
target.
This
corr
ealized
the function
formula using
is closed
when
it IcmpSendhasparameter
no
way
as
(¬closesocket
(x) Uassendto
(x,
β,of
y)),traces
where the firs
rmula:
=
∃x,
y.
socket
(x, α)
∧
1 We
siable
of is ϕbound
then
define
a
behavior
pattern
the
set
by
a
quantifier.
represents
the
ping
target.
This
corresponds
to
the
FOLTL
formula:
ϕ
=
∃x.
IcmpSendEcho
(x
2
eter
of
socket
is
the
created
socket
and
the
second
pa
to
(x,
β,
y)),
where
the
first
paramfrmula:
there
variables
of
sort
Data
and
σ
∈
(x).
fied ϕcarrying
2 = ∃x. IcmpSendEcho
out
its
functionality,
i.e.,
as
the
set
of
traces
validating
Hence,
the
ping
functionality
may
be
described
by
th
is
the
network
protocol,
the
first
parameter
of
send
d
socket
and
the
second
parameter
tution
over
Yformula
. The
of
nctionality
may
be application
described
by the
the
describing
the
functionality.
t =
used
socket,
the
second
parameter is the sent data
FOLTL
formula:
ϕ
=
ϕ
∨
ϕ
.
es=naturally
first
parameter
of
sendto
is
the
ping
1
2
defined
by
the
formula
ϕ1of
∨ ϕ2 .
ces
darameter
by
third one is the target, the unique parameter of clos
is the issent
data
and
the
e x in ϕDefinition
which
in Y1.
has
been
A
behavior
pattern
is
a
set
of
traces
B
⊆
isathebehavior
freed socketpattern
and constants
α and
in Ftrac
ehavior
pattern
assatisfy
the
set
of traces
ets
d ide
We
then
define
as the
setβ of
•ofBad
traces
a FO-LTL
formula
unique
parameter
of
closesocket
T
(F
)
validating
a
closed
FOLTL
formula
ϕ
on
AP
=
Trace
Σ
above
parameters
IPPROTO_ICMP
and
ICMP_ECHORE
ality,
i.e.,
as
the
set
of
traces
validating
mata
ntified
is validated
onβinfinite
sequences
nstants
α and
in Fdout
identify
the
carrying
its
functionality,
i.e.,
as
the
set
of
traces
validatin
T
(F
,
X):
B
=
{t
T
)
t
|=
ϕ}
.
A∈ping
may (F
also
be| realized
using
the function Icm
Action
Σ
Trace
Σ
the
functionality.
s,
denoted
by
ξ
=
(ξ
,
ξ
,
.
.
.)
∈
O_ICMP
ICMP_ECHOREQ.
0 1 describing the functionality.
s A3
ξt =andthe
formula
Echo, whose parameter represents the ping target. Th
s
ϕ)
is
defined
in
the
same
way
as
zed
using
theisfunction
mata
vior
pattern
a set ofIcmpSendtraces B sponds
⊆
ted
by
to the FOLTL formula: ϕ2 = ∃x. IcmpSendE
additional
rule:
ξ
|=
∃Y.ϕ
iff
there
Dbehavior
ETECTION
P ROBLEM
presents
ping target.
correa closedthe
FOLTL
formulaThis
ϕ1.IV.
on
= Hence, the
Definition
AAP
pattern
is a set of
traces
B
ducping
functionality
may
be
described
ubst
that
ξ)abstraction
|=
ϕσ.
•such
Behavior
is made
by term rewriting systems
Y
la:
ϕ
=
∃x.
IcmpSendEcho
(x).
{t
∈
T
(F
|
t
|=
ϕ}
.
2
Trace
Σ
T
(F
)
validating
a
closed
FOLTL
formula
ϕ
on
AP
FOLTL
formula:
ϕ
=
ϕ
∨
ϕ
.
tree
As
said
before,
our
goal
is
to
be
able
to
detect,
in
a
given
set
ping
1
2
Trace
Σ
mula
is may
validated
over tracesbyofthe
tomata
onality
be described
d sequences
by of traces,
some
predefined
behavior
composed
of| combinations
T
(F
,
X):
B
=
{t
∈
T
(F
)
t
|=
ϕ}
.
of
singleton
sets
of
Action
Σ
Trace
Σ
We
then
define
a
behavior
pattern
as the set o
ϕETECTION
∨
ϕ
.
dix
A3
•
Linear
detection
algorithms
based
on
tree-automata
P
ROBLEM
1
2
trace t of
= high-level
a0 · · · an is identified
functionalities.
this,
we associate
to set
each
carryingFor
out its
functionality,
i.e., as the
of traces v
tomata
oal
is to
be
ableastothe
detect,
inofaan
set
vior
pattern
set
traces
of sets
of
atomic
predicates
ξtgiven
=abstract
the formula
describing
the functionality.
behavior
pattern
symbol
λ taken
in the alphabet
mercredi 8 juin 2011
information leak abstract behavior
is then defined by:
language, a fragment o
information leak abstract behavior is then defined by:
data variables, whose
M := ∃x. λsteal (x) ∧ ¬λinval M
(x):=U∃x.
λleak
(x)
.
λsteal (x) ∧ ¬λinvalsubset.
(x) U λleak (x) .
Welike
first
representsm
th
Byalooking
at several
malware samples,
like
smssamples,
By looking
at keyloggers,
several malware
keyloggers,
A keylogger or
sms
message
leaking
app
cess. information
For this, westealing
use a
message leaking applications ormessage
personal
information
stealing
leaking
applications
or personal
analysis
(see [21]
information mobile
leak applications,
abstract behavior
is following
then defined
by:
langua
mobile
applications,
we consider
following
definitions
o
we consider
the
definitions
of thestatic
7
the three behavior patterns involved:traces is enforced by l
the three behavior patterns involved:
data
v
1
calls,
an
approximatio
1
• λsteal
(x)
describes
a
keystroke
capture
functionality
•
λ
(x)
describes
a
keystroke
capture
functionality
M := ∃x.steal
λsteal
(x) ∧language,
¬λinval
(x) U
λmodal
(x) . to the
leak mu-calculus
information leak abstract behavior is then defined
by:
aand,
fragment
of
the
with
behavi
subset
on Android
mobile
phones, extended
the abstract
retrieving
of the
and, on Android mobile phones, the
retrieving
of the
data variables,
whose
FOLTL logic used in
this paper is to
a regul
IMEI
number:
shortcomings
M := ∃x. λsteal (x) ∧ ¬λinval (x)IMEI
U λleak
(x)
.
number:
We
subset.samples, like keyloggers,
By looking at several
malware
sms
of
conditional
branch
λsteal
(x) := GetAsyncKeyState(x)∨
We first
represent the program set of traces as a CADP proλsteallike
(x)keyloggers,
:= GetAsyncKeyState(x)∨
By looking at several malware samples,
sms
result in falseINPUT))∨
positive
cess.
F
(RegisterDev(KBD,
SINK)⊙
GetInputData(x,
message
leaking
applications
or
personal
information
stealing
cess.
For
this,
we
use
a
program
control
flow
graph
obtained
by
message leaking applications or personal
information stealing SINK)⊙ GetInputData(x, INPUT))∨
(RegisterDev(KBD,
is obfuscated.
And sec
(∃y. SetW
indowsHookEx(y,
WH_KEYBOARD_LL)∧
static
analysis
(see
[21]
and
[23],
[24]).
Regularity
of
the
set
of
Network
send
functionality
mobile
applications,
we
consider
the
following
definitions
of
static
(∃y. SetW indowsHookEx(y,
WH_KEYBOARD_LL)∧
Keystroke
or IMEI
capture
mobile
applications,
we
consider
the
following
definitions
of
¬U
nhookW
indowsHookEx(y)U
HookCalled(y,
x))∨
during
dataflow
traces is enforced by limiting recursion and inlining function analys
the three behavior patterns involved: ¬U nhookW indowsHookEx(y)U
HookCalled(y, x))∨
not significan
traces
∃y.T elephonyM
anager_getDeviceId(x,
y).
calls,
an
approximation
that can be
deemedthis
safe does
with respect
1 involved:
three
behavior
• λsteal (x) the
describes
a keystroke
capture patterns
functionality
invalidated
∃y.T elephonyM
anager_getDeviceId(x,
as two
expressed
behaviors to y).
detect. Note thatNow,
there are
and, on Android mobile phones, the retrieving of the to the abstract
• λleak (x) describes a network send functionality
unde
calls,
1 leak
shortcomings
to
regular
approximation.
First,
approximation
formation
abstra
describes a keystroke
capture
functionality
IMEI number: • λsteal (x)
or Android:
• λleak (x) describes a network Windows
send
functionality
under
of conditional branches by nondeterministic
branches
may
into
two
steps:
abstrac
to
the

�
�
λsteal (x) := GetAsyncKeyState(x)∨
Windows
or
Android:
λ
(x)
:=
∃y,
z.
sendto(z,
x,
y)∨
and, on Android mobileresultphones,
the
retrieving
of
the
in false positives,
code
≤2
leak especially when the program
�
Rλ
R (L) an
(RegisterDev(KBD, SINK)⊙ GetInputData(x, INPUT))∨ is obfuscated.∃y,
z.
(connect
(z,
y)
∧
¬close(z)
U
send(z,
x))∨
shortc
And
second,
failure
to
identify
data
correlations
λleak (x) := ∃y, z. sendto(z, x, y)∨
trace
matches
the abst
IMEI
number:
(∃y. SetW indowsHookEx(y, WH_KEYBOARD_LL)∧
during
dataflow
analysis
can
result
in
false
negatives.
However,
∃c, s. HttpU
RLConnection_getOutputStream(s,
c)∧
∃y, z. (connect
∧ ¬close(z)
U send(z,
x))∨
So,
we
can
simula
¬U nhookW indowsHookEx(y)U HookCalled(y,
x))∨(z, y)
of
co
this does¬OutputStream_close(s)U
not significantly impact our detection
results.
OutputStream_write(s, x).
s. HttpU y).
RLConnection_getOutputStream(s,
c)∧
delegate
verificatio
λstealanager_getDeviceId(x,
(x) :=∃c,GetAsyncKeyState(x)∨
∃y.T elephonyM
Now, as expressed in Theorem 3, detection
of the
the inresult
¬OutputStream_close(s)U
OutputStream_write(s,
x).
• leak
λinvalabstract
(x) describes
theM
overwriting
or
of x:the
this,
wefreeing
represent
formation
behavior
can
be
broken
down
• λleak (x) describes
a network send functionalitySINK)⊙
under
(RegisterDev(KBD,
GetInputData(x,
INPUT))∨
aLsystem
ofy)∨
communi
into
two
steps:
abstracting
the
set
of
traces
by computing
is obf
 � ≤2λinval
� (x)
:=
f
ree(x)
∨
∃y.
sprintf
(x,
Windows or Android:
• λinval (x) describes the overwriting
or
freeing
of
x:
0
�
Rλ
R (L) and
then verifying whether
abstracted
withan
a .particular
gate o
(∃y.
SetW
indowsHookEx(y,
WH_KEYBOARD_LL)∧
GetInputData(x,
INPUT)
∨
..
λleak (x) := ∃y, z. sendto(z, x, y)∨
during
trace ∨
matches
the abstract
behavior formula.library calls. Then, co
λinval (x) := f ree(x)
∃y. sprintf
0 (x, y)∨
∃y, z. (connect¬U
(z, y)nhookW
∧ ¬close(z) UindowsHookEx(y)U
send(z, x))∨
HookCalled(y,
x))∨
captured
is usuallystep
not
leaked
in itsand
raw form
So, Finally,
weINPUT)
canthe
simulate
abstraction
in CADP
GetInputData(x,
∨
. . . thedata
synchronization
withda
this
so the
we take
into account
of this data
∃c, s. HttpU RLConnection_getOutputStream(s, c)∧
delegate
verification
step totransformations
the evaluator4 module.
For via the
the
transducer
realizing
 a dependency
Finally,
the capturedx).
data
is
usually
not leaked
in
its
raw(x,
form,
elephonyM
anager_getDeviceId(x,
¬OutputStream_close(s)U∃y.T
OutputStream_write(s,
behavior
pattern
λdepends
denotes
this,
we
represent
the set
of traces
Ly)y).
ofwhich
a given
program
by
Now
�
R
is
rational
λ in representation
of xofoncommunicating
y. For
instance,
x
may
be a string
oa
a system
processes
expressed
LOTOS,
so
we
take
into
account
transformations
of
this
data
via
the
• λinval (x) describes the overwriting or freeing of x:
with
ay,
particular
gate
communications
correspond
to in CA
or x may
beonanwhich
encryption
or an synchronization
encoding
of y: forma
behavior
pattern
λ
(x,
y)
which
denotes
a
dependency
depends
• λ
(x) describes a network send functionality under
inval
inval
inval
mercredi 8 juin
λ2011
leak
(x) := f ree(x)
∨ ∃y. sprintf (x, y)∨
each bymalware
library calls. Then, computation of R≤2 (L) isFor
performed
Tool chains
C source
Promela
Model Checker
Behavior
FO-LTL
mercredi 8 juin 2011
Related work
Kinder & al, Detecting malicious code by model
checking
CHAPITRE 2. ANALYSE COMPORTEMENTALE STATIQUE
29
Ainsi, dans cette section, nous prenons l’exemple d’un programme capturant
les caractères entrés au clavier (afin d’en extraire notamment mots de passe,
informations bancaires et autres données sensibles). L’analyse sous IDA de sa
forme compilée permet de déterminer la structure de la pile et le type des
variables la composant. Par simplicité, nous travaillons sur son code source :
A C Keylogger from EADS
1
2
3
4
5
LRESULT WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam) {
RAWINPUTDEVICE rid;
RAWINPUT *buffer;
UINT dwSize;
USHORT uKey;
6
switch(msg) {
case WM_CREATE: /* Creation de la fenetre principale */
/* Initialisation de la capture du clavier */
rid.usUsagePage = 0x01;
rid.usUsage = 0x06;
rid.dwFlags = RIDEV_INPUTSINK;
rid.hwndTarget = hwnd;
RegisterRawInputDevices(&rid, 1, sizeof(RAWINPUTDEVICE));
break;
7
8
9
10
11
12
13
14
15
16
case WM_INPUT: /* Evenement clavier, souris, etc. */
/* Quelle taille pour buffer ? */
GetRawInputData( (HRAWINPUT) lParam, RID_INPUT, NULL,
&dwSize, sizeof(RAWINPUTHEADER) );
buffer = (RAWINPUT*) malloc(dwSize);
/* Recuperer dans buffer les donnees capturees */
if (!GetRawInputData( (HRAWINPUT) lParam, RID_INPUT, buffer,
&dwSize, sizeof(RAWINPUTHEADER) ))
break;
if (buffer->header.dwType == RIM_TYPEKEYBOARD &&
buffer->data.keyboard.Message == WM_KEYDOWN) {
printf("%c\n", buffer->data.keyboard.VKey);
}
free(buffer);
break;
}
/* ... */
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
mercredi 8 juin 2011
}
Ce code contient sept variables : hwnd, msg, wParam, lParam, rid, buffer
et dwSize. Les types HWND et HRAWINPUT représentent des entiers. Les types
WPARAM et LPARAM sont interprétés différement selon le contexte : ici, seule
– Pour un appel malloc(size), malloc prend 1 argument décrivant les
données suivantes : la valeur de retour.
– Pour un appel printf(format, arg1), printf1 prend 2 arguments décrivant les données suivantes : format, arg1.
Enfin, on indexe la représentation abstraite par des symboles de Fd : hwnd,
main_hwnd, msg, rid_usU sageP age, rid_usU sage, . . .
Comme dans le cas de la fuite de SMS, dans la section précédente, on peut
construire un modèle Promela du programme :
Promela translation
mtype = { CALL_COPY, CALL_RegisterRawInputDevices, CALL_GetRawInputData,
CALL_malloc, CALL_printf1, CALL_free }
chan lib_call = [0] of {mtype, int, int, int, int, int, int};
proctype lib_loop() {
xr lib_call;
do
:: lib_call ? _,_,_,_,_,_,_;
od;
}
init {
run lib_loop();
run main();
}
#define COPY(v1, v2) \
lib_call ! CALL_COPY, v1, v2;
#define RegisterRawInputDevices(pRID_usUsagePage, pRID_usUsage, pRID_dwFlags,\
pRID_hwndTarget) \
lib_call ! CALL_RegisterRawInputDevices, pRID_usUsagePage, pRID_usUsage,\
pRID_dwFlags, pRID_hwndTarget;
/* ... */
mtype = {hwnd, msg, wParam, lParam,
rid_usUsagePage, rid_usUsage, rid_dwFlags, rid_hwndTarget,
buffer, malloc10_header_dwType, malloc10_header_hDevice,
malloc10_header_wParam, malloc10_keyboard_VKey, malloc10_keyboard_Message,
dwSize,
main_hwnd,
mercredi 8 juin 2011
34
CHAPITRE 2. ANALYSE COMPORTEMENTALE STATIQUE
Behavior graph
main
S21
copy(hwnd,main_hwnd)
S19
copy(rid_hwndTarget,hwnd)
S24
GetRawInputData(_0,_0,_0,_0,_0)
-end-
S5
S3
S0
malloc(buffer)
S6
GetRawInputData(_0,malloc10_hdr_dwType,malloc10_hdr_hDevice,malloc10_kbd_VKey,malloc10_kbd_Message)
S9
RegisterRawInputDevices(_1,_6,RIDEV_INPUTSINK,rid_hwndTarget)
S13
printf1(FMT,malloc10_kbd_VKey)
S15
free(buffer)
S18
Figure 2.7: Automate de traces pour le keylogger en Section 2.4.4.
mercredi 8 juin 2011
Malicious behavior detection
• Detection of malicious behaviors
• Abstraction provides a high level notion of signature which is robust with
respect to some functional obfuscation methods
• Behavior analysis from a set of traces which comes from static or dynamic
analysis
• Detections algorithms are efficient based on (word/tree) automata
techniques
• Tests with Allaple, Virus, Agent, Rbot, Afcore and Mimail
mercredi 8 juin 2011
High Security Lab @ Nancy
lhs.loria.fr
Telescope & honeypots
In vitro experiment clusters
mercredi 8 juin 2011
Thanks !
mercredi 8 juin 2011

Documents pareils