Botnet detection using netflow information

Transcription

Botnet detection using netflow information
Botnet detection using netflow information
Finding new botnets based on client connections
Herwin Weststrate
[email protected]
ABSTRACT
Botnets are one of the biggest threats of the internet today. Probably a few million computers are infected with botnet clients and
used in malicious activities like sending spam or fraud e-mails
and assisting in DDos attacks. Current detection methods are often based on network activities afterwards, but it might be better
to detect the clients when connecting to the botnet and prevent
the activities from happening.
This paper proposes and tests a method based on a number of
heuristics to find new botnet controllers purely based on client
behaviour when they are connecting to an IRC botet controller.
This is based on the netflow information, which means there are
little to no privacy issues, and it does not require much processing
power.
Unfortunately, our system is not perfect and some further research might be done, especially for the use of sampled netflow
data and for the detection of botnets that use communication mechanisms other than IRC.
Keywords
botnet, detection, intrusion detection, malware, netflow, spam,
zombie
1.
INTRODUCTION
Botnets are one of the biggest threats to the internet today. Estimations say that at least 270.000 machines are infected, the real
amount might even be lots higher. [4]
Botnets are built up from infected hosts (normally referred to as
zombies) that run a botnet client. Often the administrator of these
hosts doesn’t even know that it is infected. A Command and Control (C&C) center or Botnet Master can send commands to the
infected hosts.
The main purpose of the botnets of today is probably sending
spam e-mails to harvested addresses. The people running those
botnets often sell this as as service to everyone who likes to send
an e-mail to a huge amount of receipients.
Because of the distributed behaviour of these attacks it is hard
to detect the attacks real-time with the current detection mechanisms; none of the zombies on their own generate enough suspicious traffic to be detected.
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. To copy otherwise, or republish, to post on servers or
to redistribute to lists, requires prior specific permission.
10th Twente Student Conference on IT, Enschede 23rd January, 2009
Copyright 2008, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science
The purpose of this paper is to find a reliable method to detect
new botnet servers based on characteristics of the connections of
the clients. This leads us to the main research question:
• Is it possible to detect new botnets using netflow information?
To do this, we break up this main research question in the following subquestions:
• Can we distinguish the legal IRC connections from illegal
ones?
• What characteristics does botnet traffic have?
To do this we start with a list of known botnet controllers and try
to find the characteristics of the hosts connected to this controller.
Once these characteristics have been determined we are able to
search for new connections that match these characteristics. If
everything works out as planned the found connections are botnet
connections that allow us to identify the botnet controllers.
2. CURRENT DETECTION
2.1 Communication mechanisms
The communications mechanisms used among the Command and
Control centers and the zombie hosts can be split in two parts:
push and pull groups. Both groups often use existing internet
protocols to ease the implementation and to masquerade their malicious traffic by making it look like normal internet traffic.
The pull group contains methods that periodically check for commands, for example on a website or in a mailbox. The impact of
this mechanism is that the attacks can’t be organised in real-time,
except when all zombies check at exactly the same moment, or
there is a timestamp given in the command. Both are not preferable by the C&C masters: the first solution makes the zombies
very predictable, thus detectable. The second one means large
delays between the command and the actual attack.
The push group has two implementations that are often seen in the
wild: using IRC and using Peer-to-Peer networks. The peer-topeer version is relatively new, with very little information about
the exact protocols used available. [2] The focus of this paper is
on the botnets using IRC.
Internet Relay Chat (IRC) was introduced as a memo defining a
new communication protocol back in 1989. The basic idea was to
create a network of servers to which people can connect. On the
network channels may be created by individual users on any subject, allowing users to chat about it. All communication between
the client and the server is done in a plain-text format. [5]
Many zombie clients use the IRC protocol as an easy implementation of a multi-communication channel. The botnet master can
write a command on the IRC channel and the protocol takes care
of the delivery to all joined zombies. To minimize the network
traffic all unrequired network messages (like channel joins/parts)
are often filtered, only the real commands are considered important. This decreases the chance of being detected.
There are a lot of public IRC networks on the internet, some of
the largest of them are EFnet, IRCnet, QuakeNet and Undernet.
On these servers channels exists, normally with a typical subject,
which allow people with shared interests to gather. Every user has
the possibility to create a new channel. This means it is possible
to create a channel dedicated to a certain botnet. We will refer to
these channels as illegal channels on legal servers.
2.2
Current detection mechanisms
The simplest detection that makes use of netflow is often based on
host-connection matching with a list of known botnet controllers.
This method requires that the botnet controllers are known in
some way, and that these are dedicated controllers. As soon as
a public IRC network is used this method will fail or generate
false-positives.
An other possible detection mechanisms would be to analyze all
TCP traffic to search for known botnet commands. This might be
possible for real small-scale networks, but the avarage businesssize network will cost too much processing power. Besides, we
need information about the botnet before we can use this detection method.
The most common way of botnet detection is done by the malicious network behaviour after a command has been send. This
kind of detection has a big downside: detection is only possible
when the harm is already done, which means you are always one
step behind.
A common counter-measure to prevent spam from being transmitted is an outgoing firewall that blocks all SMTP-traffic except
for some known hosts. Normal mail clients can use a trusted host
as SMTP-server or smarthost, malware won’t know this host and
try to connect to a receiving mailserver directly. The firewall will
block this traffic, so the malware is unable to deliver the spam.
It is possible that the malware will try to search for a reachable
relay mailserver when all outgoing SMTP traffic is blocked and
send it again, but this is research on the application-level, not the
network-level.
An other detection mechanism is to analyze the requests done for
the DNS server to see if there were requests for hostnames that
are known to host a botnet controller. This is especially interesting for botnets that use HTTP as a communication mechanism,
because this protocol does obtain more information from a hostname in the request than the IP address. [3]
This does still imply we need information about the botnets, we’re
unable to find new ones this way.
3. FINDING SUSPICIOUS SERVERS
3.1 Known botnet controllers
A list of known botnet controllers was obtained from Quarantainenet. [1] These addresses have been collected by reverse engineering and network-level behaviour analysis of various pieces of
malware. This list has not been filtered in any way, which means
it may include inactive hosts or legal IRC servers. Only the zombies that use IRC as communication mechanism are listed.
The information was limited to
• IP address
• TCP destination port
• Application layer protocol (always IRC)
• IRC channel name
The name of the IRC channel can not be obtained from the netflow data, the rest of the data can be used to search for connections.
3.2 First filter on the list
3.2.1 Reachability
The list does not contain any information on the last activity of
the botnet server, it is possible that it contains a number of entries that aren’t active anymore. We might still be able to find
clients trying to connect to the servers, but there will be an ICMP
Host Unreachable or ICMP Port Unreachable reply, depending
on what happened with the server.
To test which servers are still online and which are not reachable
anymore a small program was written to make a connection to
every server on the list. If the connection did not succeed within
5 seconds we consider the server as inactive. This means we
cannot detect servers with closed ports that are opened in special
ways (e.g port knocking or pre-authentication on a website), but
we don’t expect them to exist (yet). There is a chance that the IP
has switched owner and something else was running on the port
that has been used for a botnet controller, but this chance was
ought to be small enough to be ignored.
The results were devestating: from the 161 hosts from the original list only 12 appeared to be online. The offline hosts can still
be used to detect client infections, but not for server detection.
Because there is a relatively big chance these servers have been
shut down we will not take them into account for our further measurements.
3.2.2 TCP ports
Illegal IRC servers often have a tendency to masquerade the fact
that they actually are an IRC server. The list contained a total
of 161 unique hosts, only 66 of them were using the official port
range of the IRC servers (6665-6669, 7000, IRC normally uses
6667 as default), the other ports looked pretty much scattered
on the available ports. A notable point of interest is that there
were only 7 hosts running a server on a port below 1024, the
range of ports that require root privileges on UNIX based operating systems. The most common occurence was port 8080 (reserved for HTTP caching proxies, often used as alternate port for
webservers), but we can’t really draw conclusions as it still didn’t
appear more than 7 times. This is probably the easiest way to
masquerade the IRC traffic so it appears as normal HTTP traffic
that is not firewalled and will not raise any suspiscion without the
need for root access on the IRC server.
Comparing the onine servers from the previous section with the
ports gives complete different results: 9 of 12 hosts are using a
port in the offical range (with 6667 listed 7 times). A possible
cause might be that all or most legal IRC servers are included in
this list, and most illegal servers are shut down thus excluded.
• Source IP address
• Destination IP address
• Internet layer protocol (only interested in IP and ICMP)
3.2.3
DNS records
The internet is a network of connected hosts. Each public host is
uniquely identified by a numeric address (32 bits for IPv4, 128
bits for IPv6). These numbers are quite difficult to remember
for most humans, that’s why the Domain Name System has been
introduced: this can map a normal name to a numeric IP address
and vice versa.
The reverse DNS is the one that maps a numeric address back to
a hostname. Many mailservers consider the absence of a reverse
DNS record, or a reverse DNS record that does not match the
forward, as a qualification that the sending mail relay is probably
a spammer. The basic idea behind this is the assumption that
all enterprise hosts of the internet have a reverse DNS record,
and the normal enduser hosts do not. The Exim mailserver has a
commented access control entry to block all mail from hosts that
do not have a reverse DNS record, including a description that
this might block out both spam and legal mail.
Of the 12 remaining hosts 5 did not have a reverse DNS record,
and 4 others had a reverse DNS record that did not have a corresponding forward DNS record.
This measurement is not a good way of determining what might
be the illegal IRC servers. To add an example, of the 8 IP addresses that are pointed to by irc.quakenet.org 2 of them failed in
a reverse DNS lookup, and two others did not have a reverse DNS
record not ending on quakenet.org. The only thing we can look
up is when a reverse DNS (and the matching forward) is found in
a known legal IRC network.
The best way to find out what networks belong to the unfound IP
addresses is simply to login with a normal IRC client: The server
info will show the real name of the server, that still needs to be
verified to be sure the server belongs to the network it says it does.
After a few tries the results were clear: forget the checks for DNS
records, it does have a false-positive ratio over 50%, thus useless.
3.3
Confidence for remaining hosts
Using the described methods above we can filter some of the
hosts. We can remove the hosts that are found in a known network, and raise a higher level of suspiscion for the hosts that are
using non-standarized TCP ports. The other assumptions all had
a too high risk of false positives.
Still we have absolutely no certainty that, once a new botnet controller is given to us, we can automatically determine wether it’s
an illegal IRC server or not. This step will still require human
interaction. This can be enlightened using a whitelist of known
legal IRC servers.
4. MEASURING
4.1 Structure of measurement setup
The main focus of this paper lays on the real-time detection using
netflow data. To increase the number of measurements a series
of other network flow formats has been used. To create a format
similar to netflow the following data had to be extracted:
• Timestamp
• Trasport layer protocol (only interested in TCP in case of
IP)
• ICMP type (only for ICMP traffic)
• ICMP code (only for ICMP traffic)
• Source port (only for TCP traffic)
• Destination port (only for TCP traffic)
• Packet size (only for TCP traffic)
• Remove control flags. Even though netflow supports them
they are often excluded in the actual data. (Only for TCP
traffic)
A structure with an abstract Packet class and two implementations (one for netflow, one for PCAP) has been developed. Other
implementations are easy to produce when required, but were not
useful in this testing environment.
4.2 Old data
A number of PCAP data files have been obtained from SimpleWeb. [6]
This data is a complete dump of the first part of the packets,
starting at the ethernet level. To preserve the privacy aspect the
source and destination IP addresses are scrambled using a 1-to1 mapping. The downside for us is that it’s useless to compare
the IP addresses to our list with known botnet controllers without knowing the anonymization algorithm (which would defy the
anonymization).
The data has been collected during May and June 2007. Older
data (2004 and before) was available too, but the amount of zombies was relatively small those days, and the chance of less advanced network obfuscation algorithms was relatively big, so these
were not examined.
This data was stripped from most information, only the parts described in the above section were left. Because of the absence
of a valid list of known controllers for these addresses they have
been inspected for low-volume traffic to reserved IRC ports instead, hoping that less network level obfuscation was used those
days and most botnet traffic went to destination port 6667. As
soon as a controller runs on a different port we will not be able
to find it, which means the simplest form of traffic obfuscation is
enough to bypass the detection.
To distinguish normal IRC traffic from botnet traffic we assume
that botnet traffic matches the following criteria:
• Most packets are really small and appear in regular intervals (PING/PONG communication), where normal IRC
traffic has more and larger packets.
• Botnet traffic starts when the computer is booted, and does
not stop until the computer is powered down. If there is a
gap of more than 5 minutes in the IRC traffic of the host is
has been powered down, thus no other traffic wil occur.
• The botnet controllers are used by fewer clients than the
legel IRC servers.
• Larger packets from botnet controllers are sent to all connected clients, larger packets from legal hosts are better distributed among the connected clients. Most botnet servers
only use one channel.
The results were pretty disappointing: of all known IRC ports
only port 7000 occured in the captured streams, and better inspection showed us it was only used as a source port, not as the destination. In other words: there was no usable information found in
these captures.
4.3
Live data stream
For live measurements a full netflow stream has been obtained
from a large-scale network (2000 home connections, around 6000
business connections, all internet-routable IP addresses). This
stream has been analyzed during October and November 2008.
Unfortunatelly (at least for us) this netwerk appears much cleaner
than expected as far as our measurements could say. Only one
computer appeared to be connected to a botnet, but nothing that
looked like an attack has been seen, only a very basic keep-alive
connection. A manual connection to the server showed it actually
was a legal server of the Quakenet network.
Still the packets were received at regular intervals and were very
small in size, which implies the client was completely idle. Given
the circumstances there is a chance for idle IRC clients on the
examined network.
A command to the zombie would have looked like a packet that’s
larger than the normal packets from the botnet controller, possible an IRC reaction from the zombie, and a sudden increase in
network behaviour that was not normal for that PC (e.g. outgoing SMTP connections). When sampled netflow would have been
used there is a really big chance we’d would have missed the crucial information from this botnet connection.
A second computer on the network often tried to connect to a botnet, but only received ICMP port unreachable messages, which
means the service is not reachable. Possibly the server has been
deactivated, those effects are described in more detail in section
3.2.1.
The abuse departments of the second computer has been informed
about the connection attemps, with the advise to suggest an antivirus/spyware removal tool to the users.
5.
CONCLUSION
No autonomous system for the discovery of new botnets has been
found. Based on the information from research on a large number
of malware/botnet clients a number of heuristics has been set up
to seperate IRC connections to botnet controllers from legal ones.
In section 4.2 a number of heuristics have been tested that were
supposed to identify botnet controllers without any prior knowledge. The proposed heuristics will fail as soon as the IRC server
is running on a different TCP port, making it very likely that most
botnets remain undetected. Not enough data was available to really verify the correctness of them.
A live data stream has been analyzed in section 4.3, showing that
there was very little detected botnet activity in our test network.
A possible cause is the shift from IRC botnets to other connection mechanisms, like peer-to-peer. Besides that, we are not sure
about the percentage of the current IRC botnets that are included
in the list that has been used for verification.
5.1
Further research
To improve the heuristics of section 4.2 a method to identify IRC
connection the non-standard ports would be helpful.
Because lots of botnets are switching to a peer-to-peer network a
research about the trend of IRC bots would be welcome. Depending on those results a decision can be made to put more effort in
IRC botnet detection or to prioritize the detection of peer-to-peer
bots.
REFERENCES
[1] Quarantainenet B.V. Quarantainenet B.V. Network
management and security.
http://www.quarantainenet.nl.
[2] J.B. Grizzard et all. Peer-to-peer botnets: Overview and case
study. In Proceedings - USENIX HotBots ’07, 2007.
[3] R. Fielding et all. Hypertext Transfer Protocol – HTTP/1.1.
http://www.ietf.org/rfc/rfc2616.txt.
[4] Shadowserver Foundation. Shadowserver Botnet count,
visited 2008-12-14. http://www.shadowserver.org/
wiki/pmwiki.php?n=Stats.BotCountDaily.
[5] D. Reed J. Oikarinen. Internet Relay Chat Protocol.
http://www.ietf.org/rfc/rfc1459.txt.
[6] A. R. Van de Meent, Pras. Simpleweb / University of Twente
- Traffic Measurement Data Repository , visited 2008-10-28.
http://traces.simpleweb.org.

Documents pareils