6.10 改变因子水平次序

6.10.1 问题

你想要改变因子水平出现的次序。

6.10.2 方案

R 中有两种不同类型的因子变量:有序无序。比如{小,中,大}和{钢笔,橡皮擦,铅笔}。对于绝大多数分析而言,一个因子变量是有序还是无序并不重要。如果因子是有序的,那么这个因子水平的特定次序是重要的(小 < 中 < 大)。如果因子是无序的,那么因子水平同样会以一定的顺序出现,但这仅仅为了方便而已(钢笔,橡皮擦,铅笔) - 但有时它是重要的,例如它会指导结果如何输出,图形元素如何展示。

一种改变因子次序的方式是对因子使用 factor() 函数并且直接指定它们的次序。下面这个例子中,ordered() 函数可以替换 factor() 函数。

下面是这个例子的数据:

  1. # 创建一个错误次序的因子
  2. sizes <- factor(c("small", "large", "large", "small", "medium"))
  3. sizes
  4. #> [1] small large large small medium
  5. #> Levels: large medium small

因子水平被显式地指定:

  1. sizes <- factor(sizes, levels = c("small", "medium", "large"))
  2. sizes
  3. #> [1] small large large small medium
  4. #> Levels: small medium large

我们同样可以对有序因子这样操作:

  1. sizes <- ordered(c("small", "large", "large", "small", "medium"))
  2. sizes <- ordered(sizes, levels = c("small", "medium", "large"))
  3. sizes
  4. #> [1] small large large small medium
  5. #> Levels: small < medium < large

另一种方式是使用 relevel() 函数在列表中制作一个特定水平(这对有序因子不起作用)。

  1. # 创建错误次序的因子
  2. sizes <- factor(c("small", "large", "large", "small", "medium"))
  3. sizes
  4. #> [1] small large large small medium
  5. #> Levels: large medium small
  6. # 使得 medium 排最前面
  7. sizes <- relevel(sizes, "medium")
  8. sizes
  9. #> [1] small large large small medium
  10. #> Levels: medium large small
  11. # 使得 small 排最前面
  12. sizes <- relevel(sizes, "small")
  13. sizes
  14. #> [1] small large large small medium
  15. #> Levels: small medium large

当因子创建时,我们可以指定合适的顺序。

  1. sizes <- factor(c("small", "large", "large", "small", "medium"),
  2. levels = c("small", "medium", "large"))
  3. sizes
  4. #> [1] small large large small medium
  5. #> Levels: small medium large

反转因子水平次序。

  1. # 创建错误次序的因子
  2. sizes <- factor(c("small", "large", "large", "small", "medium"))
  3. sizes
  4. #> [1] small large large small medium
  5. #> Levels: large medium small
  6. sizes <- factor(sizes, levels = rev(levels(sizes)))
  7. sizes
  8. #> [1] small large large small medium
  9. #> Levels: small medium large