Filling in NAs with last non-NA value

Problem

You want to replace NA’s in a vector or factor with the last non-NA value.

Solution

This code shows how to fill gaps in a vector. If you need to do this repeatedly, see the function below. The function also can fill in leading NA’s with the first good value and handle factors properly.

  1. # Sample data
  2. x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA)
  3. goodIdx <- !is.na(x)
  4. goodIdx
  5. #> [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
  6. #> [15] TRUE TRUE FALSE FALSE
  7. # These are the non-NA values from x only
  8. # Add a leading NA for later use when we index into this vector
  9. goodVals <- c(NA, x[goodIdx])
  10. goodVals
  11. #> [1] NA "A" "A" "B" "B" "B" "C" "A" "A" "B"
  12. # Fill the indices of the output vector with the indices pulled from
  13. # these offsets of goodVals. Add 1 to avoid indexing to zero.
  14. fillIdx <- cumsum(goodIdx)+1
  15. fillIdx
  16. #> [1] 1 1 2 3 4 5 6 6 6 7 7 7 7 8 9 10 10 10
  17. # The original vector with gaps filled
  18. goodVals[fillIdx]
  19. #> [1] NA NA "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"

A function for filling gaps

This function does the same as the code above. It can also fill leading NA’s with the first good value, and handle factors properly.

  1. fillNAgaps <- function(x, firstBack=FALSE) {
  2. ## NA's in a vector or factor are replaced with last non-NA values
  3. ## If firstBack is TRUE, it will fill in leading NA's with the first
  4. ## non-NA value. If FALSE, it will not change leading NA's.
  5. # If it's a factor, store the level labels and convert to integer
  6. lvls <- NULL
  7. if (is.factor(x)) {
  8. lvls <- levels(x)
  9. x <- as.integer(x)
  10. }
  11. goodIdx <- !is.na(x)
  12. # These are the non-NA values from x only
  13. # Add a leading NA or take the first good value, depending on firstBack
  14. if (firstBack) goodVals <- c(x[goodIdx][1], x[goodIdx])
  15. else goodVals <- c(NA, x[goodIdx])
  16. # Fill the indices of the output vector with the indices pulled from
  17. # these offsets of goodVals. Add 1 to avoid indexing to zero.
  18. fillIdx <- cumsum(goodIdx)+1
  19. x <- goodVals[fillIdx]
  20. # If it was originally a factor, convert it back
  21. if (!is.null(lvls)) {
  22. x <- factor(x, levels=seq_along(lvls), labels=lvls)
  23. }
  24. x
  25. }
  26. # Sample data
  27. x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA)
  28. x
  29. #> [1] NA NA "A" "A" "B" "B" "B" NA NA "C" NA NA NA "A" "A" "B" NA NA
  30. fillNAgaps(x)
  31. #> [1] NA NA "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"
  32. # Fill the leading NA's with the first good value
  33. fillNAgaps(x, firstBack=TRUE)
  34. #> [1] "A" "A" "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"
  35. # It also works on factors
  36. y <- factor(x)
  37. y
  38. #> [1] <NA> <NA> A A B B B <NA> <NA> C <NA> <NA> <NA> A A B <NA>
  39. #> [18] <NA>
  40. #> Levels: A B C
  41. fillNAgaps(y)
  42. #> [1] <NA> <NA> A A B B B B B C C C C A A B B
  43. #> [18] B
  44. #> Levels: A B C

Notes

This is adapted from na.locf() in the zoo library.