Parsing Log Files

Apache Log File

Let’s first list the important information that we may need from the Apache logs

  • IP address
  • Time stamp
  • HTTP method
  • URI path
  • Response code
  • User agent

To read a log file, I prefer to read it as lines

  1. apache_logs = File.readlines "/var/log/apache2/access.log"

I was looking for a simple regular expression for Apache logs. I found one here with small tweak.

  1. apache_regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) "-" "(.*)"/

So I came up with this small method which parses and converts Apache “access.log” file to an array contains a list of hashes with our needed information.

  1. #!/usr/bin/env ruby
  2. # KING SABRI | @KINGSABRI
  3. apache_logs = File.readlines "/var/log/apache2/access.log"
  4. def parse(logs)
  5. apache_regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) ([^\s]+?) "(.*)"/
  6. result_parse = []
  7. logs.each do |log|
  8. parser = log.scan(apache_regex)[0]
  9. # If can't parse the log line for any reason.
  10. if log.scan(apache_regex)[0].nil?
  11. puts "Can't parse: #{log}\n\n"
  12. next
  13. end
  14. parse =
  15. {
  16. :ip => parser[0],
  17. :user => parser[1],
  18. :time => parser[2],
  19. :method => parser[3],
  20. :uri_path => parser[4],
  21. :protocol => parser[5],
  22. :code => parser[6],
  23. :res_size => parser[7],
  24. :referer => parser[8],
  25. :user_agent => parser[9]
  26. }
  27. result_parse << parse
  28. end
  29. return result_parse
  30. end
  31. require 'pp'
  32. pp parse(apache_logs)

Returns

  1. [{:ip=>"127.0.0.1",
  2. :user=>"",
  3. :time=>"12/Dec/2015:20:09:05 +0300",
  4. :method=>"GET",
  5. :uri_path=>"/",
  6. :protocol=>"HTTP/1.1",
  7. :code=>"200",
  8. :res_size=>"3525",
  9. :referer=>"\"-\"",
  10. :user_agent=>
  11. "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"},
  12. {:ip=>"127.0.0.1",
  13. :user=>"",
  14. :time=>"12/Dec/2015:20:09:05 +0300",
  15. :method=>"GET",
  16. :uri_path=>"/icons/ubuntu-logo.png",
  17. :protocol=>"HTTP/1.1",
  18. :code=>"200",
  19. :res_size=>"3689",
  20. :referer=>"\"http://localhost/\"",
  21. :user_agent=>
  22. "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"},
  23. {:ip=>"127.0.0.1",
  24. :user=>"",
  25. :time=>"12/Dec/2015:20:09:05 +0300",
  26. :method=>"GET",
  27. :uri_path=>"/favicon.ico",
  28. :protocol=>"HTTP/1.1",
  29. :code=>"404",
  30. :res_size=>"500",
  31. :referer=>"\"http://localhost/\"",
  32. :user_agent=>
  33. "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}]

Note: The Apache LogFormat is configured as LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined which is the default configurations.

  • %h is the remote host (i.e. the client IP address)
  • %l is the identity of the user determined by identd (not usually used since not reliable)
  • %u is the user name determined by HTTP authentication
  • %t is the time the request was received.
  • %r is the request line from the client. (“GET / HTTP/1.0”)
  • %>s is the status code sent from the server to the client (200, 404 etc.)
  • %b is the size of the response to the client (in bytes)
  • Referer is the page that linked to this URL.
  • User-agent is the browser identification string.

IIS Log File

Here is a basic IIS log regular expression

  1. iis_regex = /(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) ([^\s]++?) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (\d{2}) (GET|POST|PUT|DELETE) ([^\s]++?) - (\d+) (\d+) (\d+) (\d+) ([^\s]++?) (.*)/