Finding the needles in a log file haystack

You have just been presented with a daunting task: Here are several gigabytes of log files; let us know if they can tell you anything.

OK, where do you start? Luckily there are a few free tools that can help you find the proverbial needle in this kind of digital haystack, but the process is just as important. Any forensic procedure should be methodical so you don't end up duplicating your own work or the efforts of anyone you may be working with. We'll review such a process, but it is by no means the only way to proceed.

HELP WANTED: Security pros seek hacking, forensics skills

But before you even get to that, start with some fact finding. Forensic investigators need to know what they are looking for, so narrowing down "anything" into pertinent facts will be important. The next step is to confirm that you have the correct evidence files or logs to examine. If you have both of those nailed down you're ready to select your tool set.

One of the main tools is PyFLAG (Forensic and Log Analysis GUI), which is preloaded on another excellent tool: SIFT (SANS Investigative Workstation). SIFT is a fully contained forensic tool environment, with everything a forensic examiner would need from acquisition to analysis and reporting. SIFT is built on Ubuntu and comes in a DVD image or a VMWare image.

A consideration prior to using SIFT and PyFLAG: you will need to copy the log file over to the virtual environment. This can be tricky, but will typically work with Windows sharing on Windows platforms or Samba on Linux. Once you have loaded the file, start with the Case Management module to document case details. Create a new case, which simply consists of adding a case name and a time zone. The next option presented is two links: the Load a Disk Image and the Load a Preset Log File. Skip this for now; you need to do one more step before you can look at the log file.

Now select the Create Log Preset. Choose the type of file you will be examining (Event Logs, IPTables, Apache Log, IIS log). If your log doesn't fit into these categories, choose Simple or Advanced.

If you use Simple or Advanced you will have to fiddle around with the formatting to be able to read it properly, but it's fairly self-explanatory. The really cool feature that you can use once your file is in the database is the "Group By" button. This is where you can ask PyFLAG to group the messages by column. Doing so is an invaluable insight into the log that you are processing as it helps you pinpoint any outlying data point quickly.

Another tool that is helpful to narrow down the search: the latest version of Splunk, which makes it easy to import a file based on numerous formats. Splunk quickly presents that file for more complex searching capabilities.

For a quick search, starting with the Excel 2007 version and up, you can open up large files for review. When you find something of interest, simply use the "Find All" option to pull all of the instances of that nugget out of the file at once. Then you can click on the cell it identifies to go directly to that full reference.

Another quick analysis tool for log files is grep. Grep is a command line tool typically found in a Linux/Unix environment that allows you to do text searches. If you don't have this available, you can install Cygwin, which is a Windows version of a Linux/Unix command line. Grep is especially useful on large log files that don't open easily in other environments.

Lastly, Mandiant has a free tool called Highlighter that looks promising.

OK, now to get down to business. Here are some guidelines:

* Determine the business purpose of the system. You have your facts on the case, and you have some files handed over to you. But do you know what the system does for the business? Is it a mail server? A file server? Web server? Application server?

* Review the entire file. While this sounds counterproductive, it can be helpful to start. Do you need to do a line-by-line review of everything? No. But if you can page through most of the file (or a large chunk of it), you will get a sense of what is normal and what isn't. For example, if you are reviewing a mail log and see mostly the same entry over and over:

Aug 3 03:39:49 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MTA-v6: load average: 17

Aug 3 03:39:49 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MSA: load average: 17

Aug 3 03:40:09 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MTA-v4: load average: 17

Aug 3 03:40:09 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MTA-v6: load average: 17

Aug 3 03:40:09 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MSA: load average: 17

Aug 3 03:40:24 goodguy.com sendmail[532]: [ID 702911 mail.notice] rejecting connections on daemon MTA-v4: load average: 16

Then you get the notion that the mail server generally sends a lot of messages to one location. When you see it explode into sending many e-mails to many other locations, maybe some of which are outside of the country where your company doesn't even do business, this might be an indication you have found a helpful nugget of information.

Key to this entire review is ensuring that you have kept good documentation of the interesting entries. Likely, you will be reviewing quite a bit of material in this process and it's easy to forget what you have already looked at. Cutting and pasting into another file is one way to achieve this easily.

* Work small, get bigger. But how do you find that golden needle in the mountain of hay? This is where you start to narrow the field a bit. Now you can sort based on the types of messages in the log file. How many instances are there of a certain error? If you find that the regular e-mail message happens 10,000 times and this interesting login error happened 10, then perhaps the 10 messages are worth a deeper look into.

* Trace your steps. You have reviewed the main file, you have reviewed the interesting messages. You feel pretty confident that everything in that particular log file has been reviewed that is worthy of being reviewed. Have the interesting entries directed you to look somewhere else? Perhaps you found several entries where it appears a user tried to log in to another system from your mail server. There were only a few entries, so this seems odd. At this time, you should seize log files from these other systems as well. With any luck, you can piece together a trail of the incident.

Forensics in general is a long process that can be somewhat involved. Log file forensics is no exception. While there are many tools that can help you find those elusive needles, doing due diligence and having patience will help properly analyze a log file and reveal helpful supporting evidence.

Westphal, a 17-year information technology professional, is an information security consultant with a large payment processing company. Skilled in troubleshooting and process analysis, specific expertise in security areas includes forensics, operating system and network security, intrusion detection, incident handling, vulnerability analysis and policy development. Westphal has been a CISSP since 2001 and a CISA since 2008. You can reach her at kmwestphal@cox.net.

Read more about software in Network World's Software section.

Join the newsletter!

Error: Please check your email address.

Tags securitylog filesForensicssoftware

More about ApacheExcelLinuxLPMSA (Aust)SplunkUbuntu

Show Comments
[]