Extraction

String extraction is one of the main tasks that all programmers need. It’s often difficult because we don’t get an easy string presentation from which to extract useful data/information. Here are some helpful Ruby string-extraction cases.

Extracting Network Strings

Extracting MAC address from string

We need to extract all MAC addresses from an arbitrary string

  1. mac = "ads fs:ad fa:fs:fe: Wind00-0C-29-38-1D-61ows 1100:50:7F:E6:96:20dsfsad fas fa1 3c:77:e6:68:66:e9 f2"

Using Regular Expressions

This regular expression should support Windows and Linux MAC address formats.

Lets to find our mac

  1. mac_regex = /(?:[0-9A-F][0-9A-F][:\-]){5}[0-9A-F][0-9A-F]/i
  2. mac.scan mac_regex

Returns

  1. ["00-0C-29-38-1D-61", "00:50:7F:E6:96:20", "3c:77:e6:68:66:e9"]

Extracting IPv4 address from string

We need to extract all IPv4 addresses from an arbitrary string

  1. ip = "ads fs:ad fa:fs:fe: Wind10.0.4.5ows 11192.168.0.15dsfsad fas fa1 20.555.1.700 f2"
  1. ipv4_regex = /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/

Let’s find our IPs

  1. ip.scan ipv4_regex

Returns

  1. [["10", "0", "4", "5"], ["192", "168", "0", "15"]]

Extracting IPv6 address from string

  1. ipv6_regex = /^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/

Extracting Web Strings

Extracting URLs from a file

Assume we have the following string

  1. string = "text here http://foo1.example.org/bla1 and http://foo2.example.org/bla2 and here mailto:test@example.com and here also."

Using Regular Expressions

  1. string.scan(/https?:\/\/[\S]+/)

Using standard URI module
This returns an array of URLs

  1. require 'uri'
  2. URI.extract(string, ["http" , "https"])

Extracting URLs from web page

Using above tricks

  1. require 'net/http'
  2. URI.extract(Net::HTTP.get(URI.parse("http://rubyfu.net")), ["http", "https"])

or using a regular expression

  1. require 'net/http'
  2. Net::HTTP.get(URI.parse("http://rubyfu.net")).scan(/https?:\/\/[\S]+/)

Extracting email addresses from web page

  1. email_regex = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
  1. require 'net/http'
  2. Net::HTTP.get(URI.parse("http://isemail.info/_system/is_email/test/?all")).scan(email_regex).uniq

Extracting strings from HTML tags

Assume we have the following HTML contents and we need to get strings only and eliminate all HTML tags

  1. string = "<!DOCTYPE html>
  2. <html>
  3. <head>
  4. <title>Page Title</title>
  5. </head>
  6. <body>
  7. <h1>This is a Heading</h1>
  8. <p>This is another <strong>contents</strong>.</p>
  9. </body>
  10. </html>"
  11. puts string.gsub(/<.*?>/,'').strip

Returns

  1. Page Title
  2. This is a Heading
  3. This is another contents.

Parsing colon separated data from a file

During a pentest, you may need to parse text that has a very common format as follows

  1. description : AAAA
  2. info : BBBB
  3. info : CCCC
  4. info : DDDD
  5. solution : EEEE
  6. solution : FFFF
  7. reference : GGGG
  8. reference : HHHH
  9. see_also : IIII
  10. see_also : JJJJ

The main idea is to remove repeated keys and pass to one key with an array of values.

  1. #!/usr/bin/env ruby
  2. #
  3. # KING SABRI | @KINGSABRI
  4. # Usage:
  5. # ruby noawk.rb file.txt
  6. #
  7. file = File.read(ARGV[0]).split("\n")
  8. def parser(file)
  9. hash = {} # Datastore
  10. splitter = file.map { |line| line.split(':', 2) }
  11. splitter.each do |k, v|
  12. k.strip! # remove leading and trailing whitespaces
  13. v.strip! # remove leading and trailing whitespaces
  14. if hash[k] # if this key exists
  15. hash[k] << v # add this value to the key's array
  16. else # if not
  17. hash[k] = [v] # create the new key and add an array contains this value
  18. end
  19. end
  20. hash # return the hash
  21. end
  22. parser(file).each {|k, v| puts "#{k}:\t#{v.join(', ')}"}

For one-liner lovers

  1. ruby -e 'h={};File.read("text.txt").split("\n").map{|l|l.split(":", 2)}.map{|k, v|k.strip!;v.strip!; h[k] ? h[k] << v : h[k] = [v]};h.each {|k, v| puts "#{k}:\t#{v.join(", ")}"}'