Botnet detection using netflow information
Transcription
Botnet detection using netflow information
Botnet detection using netflow information Finding new botnets based on client connections Herwin Weststrate [email protected] ABSTRACT Botnets are one of the biggest threats of the internet today. Probably a few million computers are infected with botnet clients and used in malicious activities like sending spam or fraud e-mails and assisting in DDos attacks. Current detection methods are often based on network activities afterwards, but it might be better to detect the clients when connecting to the botnet and prevent the activities from happening. This paper proposes and tests a method based on a number of heuristics to find new botnet controllers purely based on client behaviour when they are connecting to an IRC botet controller. This is based on the netflow information, which means there are little to no privacy issues, and it does not require much processing power. Unfortunately, our system is not perfect and some further research might be done, especially for the use of sampled netflow data and for the detection of botnets that use communication mechanisms other than IRC. Keywords botnet, detection, intrusion detection, malware, netflow, spam, zombie 1. INTRODUCTION Botnets are one of the biggest threats to the internet today. Estimations say that at least 270.000 machines are infected, the real amount might even be lots higher. [4] Botnets are built up from infected hosts (normally referred to as zombies) that run a botnet client. Often the administrator of these hosts doesn’t even know that it is infected. A Command and Control (C&C) center or Botnet Master can send commands to the infected hosts. The main purpose of the botnets of today is probably sending spam e-mails to harvested addresses. The people running those botnets often sell this as as service to everyone who likes to send an e-mail to a huge amount of receipients. Because of the distributed behaviour of these attacks it is hard to detect the attacks real-time with the current detection mechanisms; none of the zombies on their own generate enough suspicious traffic to be detected. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission. 10th Twente Student Conference on IT, Enschede 23rd January, 2009 Copyright 2008, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science The purpose of this paper is to find a reliable method to detect new botnet servers based on characteristics of the connections of the clients. This leads us to the main research question: • Is it possible to detect new botnets using netflow information? To do this, we break up this main research question in the following subquestions: • Can we distinguish the legal IRC connections from illegal ones? • What characteristics does botnet traffic have? To do this we start with a list of known botnet controllers and try to find the characteristics of the hosts connected to this controller. Once these characteristics have been determined we are able to search for new connections that match these characteristics. If everything works out as planned the found connections are botnet connections that allow us to identify the botnet controllers. 2. CURRENT DETECTION 2.1 Communication mechanisms The communications mechanisms used among the Command and Control centers and the zombie hosts can be split in two parts: push and pull groups. Both groups often use existing internet protocols to ease the implementation and to masquerade their malicious traffic by making it look like normal internet traffic. The pull group contains methods that periodically check for commands, for example on a website or in a mailbox. The impact of this mechanism is that the attacks can’t be organised in real-time, except when all zombies check at exactly the same moment, or there is a timestamp given in the command. Both are not preferable by the C&C masters: the first solution makes the zombies very predictable, thus detectable. The second one means large delays between the command and the actual attack. The push group has two implementations that are often seen in the wild: using IRC and using Peer-to-Peer networks. The peer-topeer version is relatively new, with very little information about the exact protocols used available. [2] The focus of this paper is on the botnets using IRC. Internet Relay Chat (IRC) was introduced as a memo defining a new communication protocol back in 1989. The basic idea was to create a network of servers to which people can connect. On the network channels may be created by individual users on any subject, allowing users to chat about it. All communication between the client and the server is done in a plain-text format. [5] Many zombie clients use the IRC protocol as an easy implementation of a multi-communication channel. The botnet master can write a command on the IRC channel and the protocol takes care of the delivery to all joined zombies. To minimize the network traffic all unrequired network messages (like channel joins/parts) are often filtered, only the real commands are considered important. This decreases the chance of being detected. There are a lot of public IRC networks on the internet, some of the largest of them are EFnet, IRCnet, QuakeNet and Undernet. On these servers channels exists, normally with a typical subject, which allow people with shared interests to gather. Every user has the possibility to create a new channel. This means it is possible to create a channel dedicated to a certain botnet. We will refer to these channels as illegal channels on legal servers. 2.2 Current detection mechanisms The simplest detection that makes use of netflow is often based on host-connection matching with a list of known botnet controllers. This method requires that the botnet controllers are known in some way, and that these are dedicated controllers. As soon as a public IRC network is used this method will fail or generate false-positives. An other possible detection mechanisms would be to analyze all TCP traffic to search for known botnet commands. This might be possible for real small-scale networks, but the avarage businesssize network will cost too much processing power. Besides, we need information about the botnet before we can use this detection method. The most common way of botnet detection is done by the malicious network behaviour after a command has been send. This kind of detection has a big downside: detection is only possible when the harm is already done, which means you are always one step behind. A common counter-measure to prevent spam from being transmitted is an outgoing firewall that blocks all SMTP-traffic except for some known hosts. Normal mail clients can use a trusted host as SMTP-server or smarthost, malware won’t know this host and try to connect to a receiving mailserver directly. The firewall will block this traffic, so the malware is unable to deliver the spam. It is possible that the malware will try to search for a reachable relay mailserver when all outgoing SMTP traffic is blocked and send it again, but this is research on the application-level, not the network-level. An other detection mechanism is to analyze the requests done for the DNS server to see if there were requests for hostnames that are known to host a botnet controller. This is especially interesting for botnets that use HTTP as a communication mechanism, because this protocol does obtain more information from a hostname in the request than the IP address. [3] This does still imply we need information about the botnets, we’re unable to find new ones this way. 3. FINDING SUSPICIOUS SERVERS 3.1 Known botnet controllers A list of known botnet controllers was obtained from Quarantainenet. [1] These addresses have been collected by reverse engineering and network-level behaviour analysis of various pieces of malware. This list has not been filtered in any way, which means it may include inactive hosts or legal IRC servers. Only the zombies that use IRC as communication mechanism are listed. The information was limited to • IP address • TCP destination port • Application layer protocol (always IRC) • IRC channel name The name of the IRC channel can not be obtained from the netflow data, the rest of the data can be used to search for connections. 3.2 First filter on the list 3.2.1 Reachability The list does not contain any information on the last activity of the botnet server, it is possible that it contains a number of entries that aren’t active anymore. We might still be able to find clients trying to connect to the servers, but there will be an ICMP Host Unreachable or ICMP Port Unreachable reply, depending on what happened with the server. To test which servers are still online and which are not reachable anymore a small program was written to make a connection to every server on the list. If the connection did not succeed within 5 seconds we consider the server as inactive. This means we cannot detect servers with closed ports that are opened in special ways (e.g port knocking or pre-authentication on a website), but we don’t expect them to exist (yet). There is a chance that the IP has switched owner and something else was running on the port that has been used for a botnet controller, but this chance was ought to be small enough to be ignored. The results were devestating: from the 161 hosts from the original list only 12 appeared to be online. The offline hosts can still be used to detect client infections, but not for server detection. Because there is a relatively big chance these servers have been shut down we will not take them into account for our further measurements. 3.2.2 TCP ports Illegal IRC servers often have a tendency to masquerade the fact that they actually are an IRC server. The list contained a total of 161 unique hosts, only 66 of them were using the official port range of the IRC servers (6665-6669, 7000, IRC normally uses 6667 as default), the other ports looked pretty much scattered on the available ports. A notable point of interest is that there were only 7 hosts running a server on a port below 1024, the range of ports that require root privileges on UNIX based operating systems. The most common occurence was port 8080 (reserved for HTTP caching proxies, often used as alternate port for webservers), but we can’t really draw conclusions as it still didn’t appear more than 7 times. This is probably the easiest way to masquerade the IRC traffic so it appears as normal HTTP traffic that is not firewalled and will not raise any suspiscion without the need for root access on the IRC server. Comparing the onine servers from the previous section with the ports gives complete different results: 9 of 12 hosts are using a port in the offical range (with 6667 listed 7 times). A possible cause might be that all or most legal IRC servers are included in this list, and most illegal servers are shut down thus excluded. • Source IP address • Destination IP address • Internet layer protocol (only interested in IP and ICMP) 3.2.3 DNS records The internet is a network of connected hosts. Each public host is uniquely identified by a numeric address (32 bits for IPv4, 128 bits for IPv6). These numbers are quite difficult to remember for most humans, that’s why the Domain Name System has been introduced: this can map a normal name to a numeric IP address and vice versa. The reverse DNS is the one that maps a numeric address back to a hostname. Many mailservers consider the absence of a reverse DNS record, or a reverse DNS record that does not match the forward, as a qualification that the sending mail relay is probably a spammer. The basic idea behind this is the assumption that all enterprise hosts of the internet have a reverse DNS record, and the normal enduser hosts do not. The Exim mailserver has a commented access control entry to block all mail from hosts that do not have a reverse DNS record, including a description that this might block out both spam and legal mail. Of the 12 remaining hosts 5 did not have a reverse DNS record, and 4 others had a reverse DNS record that did not have a corresponding forward DNS record. This measurement is not a good way of determining what might be the illegal IRC servers. To add an example, of the 8 IP addresses that are pointed to by irc.quakenet.org 2 of them failed in a reverse DNS lookup, and two others did not have a reverse DNS record not ending on quakenet.org. The only thing we can look up is when a reverse DNS (and the matching forward) is found in a known legal IRC network. The best way to find out what networks belong to the unfound IP addresses is simply to login with a normal IRC client: The server info will show the real name of the server, that still needs to be verified to be sure the server belongs to the network it says it does. After a few tries the results were clear: forget the checks for DNS records, it does have a false-positive ratio over 50%, thus useless. 3.3 Confidence for remaining hosts Using the described methods above we can filter some of the hosts. We can remove the hosts that are found in a known network, and raise a higher level of suspiscion for the hosts that are using non-standarized TCP ports. The other assumptions all had a too high risk of false positives. Still we have absolutely no certainty that, once a new botnet controller is given to us, we can automatically determine wether it’s an illegal IRC server or not. This step will still require human interaction. This can be enlightened using a whitelist of known legal IRC servers. 4. MEASURING 4.1 Structure of measurement setup The main focus of this paper lays on the real-time detection using netflow data. To increase the number of measurements a series of other network flow formats has been used. To create a format similar to netflow the following data had to be extracted: • Timestamp • Trasport layer protocol (only interested in TCP in case of IP) • ICMP type (only for ICMP traffic) • ICMP code (only for ICMP traffic) • Source port (only for TCP traffic) • Destination port (only for TCP traffic) • Packet size (only for TCP traffic) • Remove control flags. Even though netflow supports them they are often excluded in the actual data. (Only for TCP traffic) A structure with an abstract Packet class and two implementations (one for netflow, one for PCAP) has been developed. Other implementations are easy to produce when required, but were not useful in this testing environment. 4.2 Old data A number of PCAP data files have been obtained from SimpleWeb. [6] This data is a complete dump of the first part of the packets, starting at the ethernet level. To preserve the privacy aspect the source and destination IP addresses are scrambled using a 1-to1 mapping. The downside for us is that it’s useless to compare the IP addresses to our list with known botnet controllers without knowing the anonymization algorithm (which would defy the anonymization). The data has been collected during May and June 2007. Older data (2004 and before) was available too, but the amount of zombies was relatively small those days, and the chance of less advanced network obfuscation algorithms was relatively big, so these were not examined. This data was stripped from most information, only the parts described in the above section were left. Because of the absence of a valid list of known controllers for these addresses they have been inspected for low-volume traffic to reserved IRC ports instead, hoping that less network level obfuscation was used those days and most botnet traffic went to destination port 6667. As soon as a controller runs on a different port we will not be able to find it, which means the simplest form of traffic obfuscation is enough to bypass the detection. To distinguish normal IRC traffic from botnet traffic we assume that botnet traffic matches the following criteria: • Most packets are really small and appear in regular intervals (PING/PONG communication), where normal IRC traffic has more and larger packets. • Botnet traffic starts when the computer is booted, and does not stop until the computer is powered down. If there is a gap of more than 5 minutes in the IRC traffic of the host is has been powered down, thus no other traffic wil occur. • The botnet controllers are used by fewer clients than the legel IRC servers. • Larger packets from botnet controllers are sent to all connected clients, larger packets from legal hosts are better distributed among the connected clients. Most botnet servers only use one channel. The results were pretty disappointing: of all known IRC ports only port 7000 occured in the captured streams, and better inspection showed us it was only used as a source port, not as the destination. In other words: there was no usable information found in these captures. 4.3 Live data stream For live measurements a full netflow stream has been obtained from a large-scale network (2000 home connections, around 6000 business connections, all internet-routable IP addresses). This stream has been analyzed during October and November 2008. Unfortunatelly (at least for us) this netwerk appears much cleaner than expected as far as our measurements could say. Only one computer appeared to be connected to a botnet, but nothing that looked like an attack has been seen, only a very basic keep-alive connection. A manual connection to the server showed it actually was a legal server of the Quakenet network. Still the packets were received at regular intervals and were very small in size, which implies the client was completely idle. Given the circumstances there is a chance for idle IRC clients on the examined network. A command to the zombie would have looked like a packet that’s larger than the normal packets from the botnet controller, possible an IRC reaction from the zombie, and a sudden increase in network behaviour that was not normal for that PC (e.g. outgoing SMTP connections). When sampled netflow would have been used there is a really big chance we’d would have missed the crucial information from this botnet connection. A second computer on the network often tried to connect to a botnet, but only received ICMP port unreachable messages, which means the service is not reachable. Possibly the server has been deactivated, those effects are described in more detail in section 3.2.1. The abuse departments of the second computer has been informed about the connection attemps, with the advise to suggest an antivirus/spyware removal tool to the users. 5. CONCLUSION No autonomous system for the discovery of new botnets has been found. Based on the information from research on a large number of malware/botnet clients a number of heuristics has been set up to seperate IRC connections to botnet controllers from legal ones. In section 4.2 a number of heuristics have been tested that were supposed to identify botnet controllers without any prior knowledge. The proposed heuristics will fail as soon as the IRC server is running on a different TCP port, making it very likely that most botnets remain undetected. Not enough data was available to really verify the correctness of them. A live data stream has been analyzed in section 4.3, showing that there was very little detected botnet activity in our test network. A possible cause is the shift from IRC botnets to other connection mechanisms, like peer-to-peer. Besides that, we are not sure about the percentage of the current IRC botnets that are included in the list that has been used for verification. 5.1 Further research To improve the heuristics of section 4.2 a method to identify IRC connection the non-standard ports would be helpful. Because lots of botnets are switching to a peer-to-peer network a research about the trend of IRC bots would be welcome. Depending on those results a decision can be made to put more effort in IRC botnet detection or to prioritize the detection of peer-to-peer bots. REFERENCES [1] Quarantainenet B.V. Quarantainenet B.V. Network management and security. http://www.quarantainenet.nl. [2] J.B. Grizzard et all. Peer-to-peer botnets: Overview and case study. In Proceedings - USENIX HotBots ’07, 2007. [3] R. Fielding et all. Hypertext Transfer Protocol – HTTP/1.1. http://www.ietf.org/rfc/rfc2616.txt. [4] Shadowserver Foundation. Shadowserver Botnet count, visited 2008-12-14. http://www.shadowserver.org/ wiki/pmwiki.php?n=Stats.BotCountDaily. [5] D. Reed J. Oikarinen. Internet Relay Chat Protocol. http://www.ietf.org/rfc/rfc1459.txt. [6] A. R. Van de Meent, Pras. Simpleweb / University of Twente - Traffic Measurement Data Repository , visited 2008-10-28. http://traces.simpleweb.org.