6.9. Transforming the prediction target (y)

These are transformers that are not intended to be used on features, only onsupervised learning targets. See also Transforming target in regression ifyou want to transform the prediction target for learning, but evaluate themodel in the original (untransformed) space.

6.9.1. Label binarization

LabelBinarizer is a utility class to help create a label indicatormatrix from a list of multi-class labels:

>>>

  1. >>> from sklearn import preprocessing
  2. >>> lb = preprocessing.LabelBinarizer()
  3. >>> lb.fit([1, 2, 6, 4, 2])
  4. LabelBinarizer()
  5. >>> lb.classes_
  6. array([1, 2, 4, 6])
  7. >>> lb.transform([1, 6])
  8. array([[1, 0, 0, 0],
  9. [0, 0, 0, 1]])

For multiple labels per instance, use MultiLabelBinarizer:

>>>

  1. >>> lb = preprocessing.MultiLabelBinarizer()
  2. >>> lb.fit_transform([(1, 2), (3,)])
  3. array([[1, 1, 0],
  4. [0, 0, 1]])
  5. >>> lb.classes_
  6. array([1, 2, 3])

6.9.2. Label encoding

LabelEncoder is a utility class to help normalize labels such thatthey contain only values between 0 and n_classes-1. This is sometimes usefulfor writing efficient Cython routines. LabelEncoder can be used asfollows:

>>>

  1. >>> from sklearn import preprocessing
  2. >>> le = preprocessing.LabelEncoder()
  3. >>> le.fit([1, 2, 2, 6])
  4. LabelEncoder()
  5. >>> le.classes_
  6. array([1, 2, 6])
  7. >>> le.transform([1, 1, 2, 6])
  8. array([0, 0, 1, 2])
  9. >>> le.inverse_transform([0, 0, 1, 2])
  10. array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they arehashable and comparable) to numerical labels:

>>>

  1. >>> le = preprocessing.LabelEncoder()
  2. >>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
  3. LabelEncoder()
  4. >>> list(le.classes_)
  5. ['amsterdam', 'paris', 'tokyo']
  6. >>> le.transform(["tokyo", "tokyo", "paris"])
  7. array([2, 2, 1])
  8. >>> list(le.inverse_transform([2, 2, 1]))
  9. ['tokyo', 'tokyo', 'paris']