4.4 Select

Whereas filter removes rows, select removes columns. However, select is much more versatile than just removing columns, as we will discuss in this section. First, let’s create a dataset with multiple columns:

  1. function responses()
  2. id = [1, 2]
  3. q1 = [28, 61]
  4. q2 = [:us, :fr]
  5. q3 = ["F", "B"]
  6. q4 = ["B", "C"]
  7. q5 = ["A", "E"]
  8. DataFrame(; id, q1, q2, q3, q4, q5)
  9. end
  10. responses()
Table 7: Responses.
idq1q2q3q4q5
128usFBA
261frBCE

Here, the data represents answers for five questions (q1, q2, …, q5) in a given questionnaire. We will start by “selecting” a few columns from this dataset. As usual, we use symbols to specify columns:

  1. select(responses(), :id, :q1)
idq1
128
261

We can also use strings if we want:

  1. select(responses(), "id", "q1", "q2")
idq1q2
128us
261fr

Additionally, we can use Regular Expressions with Julia’s regex string literal. A string literal in Julia is a prefix that you use while constructing a String. For example, the regex string literal can be created with r"..." where ... is the Regular Expression. For example, suppose you only want to select the columns that start with q:

  1. select(responses(), r"^q")
q1q2q3q4q5
28usFBA
61frBCE

NOTE: We won’t cover regular expressions in this book, but you are encouraged to learn about them. To build and test regular expressions interactively, we advice to use online tools for them such as https://regex101.com/.

To select everything except one or more columns, use Not with either a single column:

  1. select(responses(), Not(:q5))
idq1q2q3q4
128usFB
261frBC

Or, with multiple columns:

  1. select(responses(), Not([:q4, :q5]))
idq1q2q3
128usF
261frB

It’s also fine to mix and match columns that we want to preserve with columns that we do Not want to select:

  1. select(responses(), :q5, Not(:q5))
q5idq1q2q3q4
A128usFB
E261frBC

Note how q5 is now the first column in the DataFrame returned by select. There is a more clever way to achieve the same using :. The colon : can be thought of as “all the columns that we didn’t include yet.” For example:

  1. select(responses(), :q5, :)
q5idq1q2q3q4
A128usFB
E261frBC

Or, to put q5 at the second position16:

  1. select(responses(), 1, :q5, :)
idq5q1q2q3q4
1A28usFB
2E61frBC

NOTE: As you might have observed there are several ways to select a column. These are known as column selectors.

We can use:

  • Symbol: select(df, :col)
  • String: select(df, "col")
  • Integer: select(df, 1)
  • RegEx: select(df, r"RegEx")

Even renaming columns is possible via select using the source => target pair syntax:

  1. select(responses(), 1 => "participant", :q1 => "age", :q2 => "nationality")
participantagenationality
128us
261fr

Additionally, thanks to the “splat” operator ... (see Section 3.3.11), we can also write:

  1. renames = (1 => "participant", :q1 => "age", :q2 => "nationality")
  2. select(responses(), renames...)
participantagenationality
128us
261fr

4.4 Select - 图1 Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso