5.3 Column Transformation

Whereas the @select macro variants performs column selection, the @transform macro variants do not perform any column selection. It can either overwrite existent columns or create new columns that will be added to the right of our DataFrame.

For example, the previous operation on :grade can be invoked as a transformation with:

  1. @rtransform df :grade_100 = :grade * 10
namegradegrade_100
Sally1.010.0
Bob5.050.0
Alice8.585.0
Hank4.040.0
Bob9.595.0
Sally9.595.0
Hank6.060.0

As you can see, @transform does not perform column selection, and the :grade_100 column is created as a new column and added to the right of our DataFrame.

DataFramesMeta.jl macros also support begin ... end statements. For example, suppose that you are creating two columns in a @transform macro:

  1. @rtransform df :grade_100 = :grade * 10 :grade_5 = :grade / 2
namegradegrade_100grade_5
Sally1.010.00.5
Bob5.050.02.5
Alice8.585.04.25
Hank4.040.02.0
Bob9.595.04.75
Sally9.595.04.75
Hank6.060.03.0

It can be cumbersome and difficult to read the performed transformations. To facilitate that, we can use begin ... end statements and put one transformation per line:

  1. @rtransform df begin
  2. :grade_100 = :grade * 10
  3. :grade_5 = :grade / 2
  4. end
namegradegrade_100grade_5
Sally1.010.00.5
Bob5.050.02.5
Alice8.585.04.25
Hank4.040.02.0
Bob9.595.04.75
Sally9.595.04.75
Hank6.060.03.0

We can also use other columns in our transformations, which makes DataFramesMeta.jl more appealing than DataFrames.jl due to the easier syntax.

First, let’s revisit the leftjoined DataFrame from Chapter 4:

  1. leftjoined = leftjoin(grades_2020(), grades_2021(); on=:name)
namegrade_2020grade_2021
Sally1.09.5
Hank4.06.0
Bob5.0missing
Alice8.5missing

Additionally, we’ll replace the missing values with 5 (Section 4.9, also note the ! in in-place variant @rtransform!):

  1. @rtransform! leftjoined :grade_2021 = coalesce(:grade_2021, 5)
namegrade_2020grade_2021
Sally1.09.5
Hank4.06.0
Bob5.05
Alice8.55

This is how you calculate the mean of grades in both years using DataFramesMeta.jl:

  1. @rtransform leftjoined :mean_grades = (:grade_2020 + :grade_2021) / 2
namegrade_2020grade_2021mean_grades
Sally1.09.55.25
Hank4.06.05.0
Bob5.055.0
Alice8.556.75

This is how you would perform it in DataFrames.jl:

  1. transform(leftjoined, [:grade_2020, :grade_2021] => ByRow((x, y) -> (x + y) / 2) => :mean_grades)
namegrade_2020grade_2021mean_grades
Sally1.09.55.25
Hank4.06.05.0
Bob5.055.0
Alice8.556.75

As you can see, the case for easier syntax is not hard to argue for DataFramesMeta.jl.

5.3 Column Transformation - 图1 Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso