5.4 Row Selection

We already covered two macros that operate on columns, @select and @transform.

Now let’s cover the only macro we need to operate on rows: @subset It follows the same principes we’ve seen so far with DataFramesMeta.jl, except that the operation must return a boolean variable for row selection.

Let’s filter grades above 7:

  1. @rsubset df :grade > 7
namegrade
Alice8.5
Bob9.5
Sally9.5

As you can see, @subset has also a vectorized variant @rsubset. Sometimes we want to mix and match vectorized and non-vectorized function calls. For instance, suppose that we want to filter out the grades above the mean grade:

  1. @subset df :grade .> mean(:grade)
namegrade
Alice8.5
Bob9.5
Sally9.5

For this, we need a @subset macro with the > operator vectorized, since we want a element-wise comparison, but the mean function needs to operate on the whole column of values.

@subset also supports multiple operations inside a begin ... end statement:

  1. @rsubset df begin
  2. :grade > 7
  3. startswith(:name, "A")
  4. end
namegrade
Alice8.5

5.4 Row Selection - 图1 Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso