6.20 计算移动平均数

6.20.1 问题

你想要计算移动平均数。

6.20.2 解决方案

假设你的数据是带缺失值的噪声正弦波。

  1. set.seed(993)
  2. x <- 1:300
  3. y <- sin(x/20) + rnorm(300, sd = 0.1)
  4. y[251:255] <- NA

filter() 函数可以用来计算移动平均数。

  1. # 绘制未平滑的数据(灰色)
  2. plot(x, y, type = "l", col = grey(0.5))
  3. # 绘制网格
  4. grid()
  5. # 延迟平滑 当前样本和之前 19 个样本的平均数(红色)
  6. f20 <- rep(1/20, 20)
  7. f20
  8. #> [1] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
  9. #> [11] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
  10. y_lag <- filter(y, f20, sides = 1)
  11. lines(x, y_lag, col = "red")
  12. # 对称性平滑:
  13. # 计算当前样本,时间上将来样本和过去10个样本的平均数(蓝色)
  14. f21 <- rep(1/21, 21)
  15. f21
  16. #> [1] 0.04762 0.04762 0.04762 0.04762 0.04762 0.04762
  17. #> [7] 0.04762 0.04762 0.04762 0.04762 0.04762 0.04762
  18. #> [13] 0.04762 0.04762 0.04762 0.04762 0.04762 0.04762
  19. #> [19] 0.04762 0.04762 0.04762
  20. y_sym <- filter(y, f21, sides = 2)
  21. lines(x, y_sym, col = "blue")

6.20 计算移动平均数 - 图1

filter() 会在遭遇缺失值时留下空缺,就像上面图中显示的一样。

一种处理缺失值的方法是简单地忽略它,不把它包含在平均数的计算中。这个功能可以由下面的函数实现:

  1. # x: the vector n: the number of samples centered: if
  2. # FALSE, then average current sample and previous (n-1)
  3. # samples if TRUE, then average symmetrically in past
  4. # and future. (If n is even, use one more sample from
  5. # future.)
  6. movingAverage <- function(x, n = 1, centered = FALSE) {
  7. if (centered) {
  8. before <- floor((n - 1)/2)
  9. after <- ceiling((n - 1)/2)
  10. } else {
  11. before <- n - 1
  12. after <- 0
  13. }
  14. # Track the sum and count of number of non-NA items
  15. s <- rep(0, length(x))
  16. count <- rep(0, length(x))
  17. # Add the centered data
  18. new <- x
  19. # Add to count list wherever there isn't a
  20. count <- count + !is.na(new)
  21. # Now replace NA_s with 0_s and add to total
  22. new[is.na(new)] <- 0
  23. s <- s + new
  24. # Add the data from before
  25. i <- 1
  26. while (i <= before) {
  27. # This is the vector with offset values to add
  28. new <- c(rep(NA, i), x[1:(length(x) - i)])
  29. count <- count + !is.na(new)
  30. new[is.na(new)] <- 0
  31. s <- s + new
  32. i <- i + 1
  33. }
  34. # Add the data from after
  35. i <- 1
  36. while (i <= after) {
  37. # This is the vector with offset values to add
  38. new <- c(x[(i + 1):length(x)], rep(NA, i))
  39. count <- count + !is.na(new)
  40. new[is.na(new)] <- 0
  41. s <- s + new
  42. i <- i + 1
  43. }
  44. # return sum divided by count
  45. s/count
  46. }
  47. # 用比较厚的线条绘制和之前一样的图
  48. plot(x, y, type = "l", col = grey(0.5))
  49. grid()
  50. y_lag <- filter(y, rep(1/20, 20), sides = 1)
  51. lines(x, y_lag, col = "red", lwd = 4) # 用红色表示延迟平均
  52. y_sym <- filter(y, rep(1/21, 21), sides = 2)
  53. lines(x, y_sym, col = "blue", lwd = 4) # 用蓝色表示对称平均
  54. # 用上面定义的函数计算延迟平均
  55. y_lag_na.rm <- movingAverage(y, 20)
  56. lines(x, y_lag_na.rm, col = "green", lwd = 2)
  57. # 用上面定义的函数计算对称性平均
  58. y_sym_na.rm <- movingAverage(y, 21, TRUE)
  59. lines(x, y_sym_na.rm, col = "green", lwd = 2)

6.20 计算移动平均数 - 图2