Analyzing the Results

Now you’re ready to write some code to analyze the results generated by test-classifier. Recall that test-classifier returns the list returned by test-from-corpus in which each element is a plist representing the result of classifying one file. This plist contains the name of the file, the actual type of the file, the classification, and the score returned by classify. The first bit of analytical code you should write is a function that returns a symbol indicating whether a given result was correct, a false positive, a false negative, a missed ham, or a missed spam. You can use **DESTRUCTURING-BIND** to pull out the :type and :classification elements of an individual result list (using **&allow-other-keys** to tell **DESTRUCTURING-BIND** to ignore any other key/value pairs it sees) and then use nested **ECASE** to translate the different pairings into a single symbol.

  1. (defun result-type (result)
  2. (destructuring-bind (&key type classification &allow-other-keys) result
  3. (ecase type
  4. (ham
  5. (ecase classification
  6. (ham 'correct)
  7. (spam 'false-positive)
  8. (unsure 'missed-ham)))
  9. (spam
  10. (ecase classification
  11. (ham 'false-negative)
  12. (spam 'correct)
  13. (unsure 'missed-spam))))))

You can test out this function at the REPL.

  1. SPAM> (result-type '(:FILE #p"foo" :type ham :classification ham :score 0))
  2. CORRECT
  3. SPAM> (result-type '(:FILE #p"foo" :type spam :classification spam :score 0))
  4. CORRECT
  5. SPAM> (result-type '(:FILE #p"foo" :type ham :classification spam :score 0))
  6. FALSE-POSITIVE
  7. SPAM> (result-type '(:FILE #p"foo" :type spam :classification ham :score 0))
  8. FALSE-NEGATIVE
  9. SPAM> (result-type '(:FILE #p"foo" :type ham :classification unsure :score 0))
  10. MISSED-HAM
  11. SPAM> (result-type '(:FILE #p"foo" :type spam :classification unsure :score 0))
  12. MISSED-SPAM

Having this function makes it easy to slice and dice the results of test-classifier in a variety of ways. For instance, you can start by defining predicate functions for each type of result.

  1. (defun false-positive-p (result)
  2. (eql (result-type result) 'false-positive))
  3. (defun false-negative-p (result)
  4. (eql (result-type result) 'false-negative))
  5. (defun missed-ham-p (result)
  6. (eql (result-type result) 'missed-ham))
  7. (defun missed-spam-p (result)
  8. (eql (result-type result) 'missed-spam))
  9. (defun correct-p (result)
  10. (eql (result-type result) 'correct))

With those functions, you can easily use the list and sequence manipulation functions I discussed in Chapter 11 to extract and count particular kinds of results.

  1. SPAM> (count-if #'false-positive-p *results*)
  2. 6
  3. SPAM> (remove-if-not #'false-positive-p *results*)
  4. ((:FILE #p"ham/5349" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999983107355541d0)
  5. (:FILE #p"ham/2746" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.6286468956619795d0)
  6. (:FILE #p"ham/3427" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9833753501352983d0)
  7. (:FILE #p"ham/7785" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9542788587998488d0)
  8. (:FILE #p"ham/1728" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.684339162891261d0)
  9. (:FILE #p"ham/10581" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999924537959615d0))

You can also use the symbols returned by result-type as keys into a hash table or an alist. For instance, you can write a function to print a summary of the counts and percentages of each type of result using an alist that maps each type plus the extra symbol total to a count.

  1. (defun analyze-results (results)
  2. (let* ((keys '(total correct false-positive
  3. false-negative missed-ham missed-spam))
  4. (counts (loop for x in keys collect (cons x 0))))
  5. (dolist (item results)
  6. (incf (cdr (assoc 'total counts)))
  7. (incf (cdr (assoc (result-type item) counts))))
  8. (loop with total = (cdr (assoc 'total counts))
  9. for (label . count) in counts
  10. do (format t "~&~@(~a~):~20t~5d~,5t: ~6,2f%~%"
  11. label count (* 100 (/ count total))))))

This function will give output like this when passed a list of results generated by test-classifier:

  1. SPAM> (analyze-results *results*)
  2. Total: 3761 : 100.00%
  3. Correct: 3689 : 98.09%
  4. False-positive: 4 : 0.11%
  5. False-negative: 9 : 0.24%
  6. Missed-ham: 19 : 0.51%
  7. Missed-spam: 40 : 1.06%
  8. NIL

And as a last bit of analysis you might want to look at why an individual message was classified the way it was. The following functions will show you:

  1. (defun explain-classification (file)
  2. (let* ((text (start-of-file file *max-chars*))
  3. (features (extract-features text))
  4. (score (score features))
  5. (classification (classification score)))
  6. (show-summary file text classification score)
  7. (dolist (feature (sorted-interesting features))
  8. (show-feature feature))))
  9. (defun show-summary (file text classification score)
  10. (format t "~&~a" file)
  11. (format t "~2%~a~2%" text)
  12. (format t "Classified as ~a with score of ~,5f~%" classification score))
  13. (defun show-feature (feature)
  14. (with-slots (word ham-count spam-count) feature
  15. (format
  16. t "~&~2t~a~30thams: ~5d; spams: ~5d;~,10tprob: ~,f~%"
  17. word ham-count spam-count (bayesian-spam-probability feature))))
  18. (defun sorted-interesting (features)
  19. (sort (remove-if #'untrained-p features) #'< :key #'bayesian-spam-probability))