Quick Start: Ruby and TimescaleDB

Goal

This quick start guide is designed to get the Rails developer up and running with TimescaleDB as their database. In this tutorial, you’ll learn how to:

Prerequisites

To complete this tutorial, you need a cursory knowledge of the Structured Query Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you’ve seen SQL before.

To start, install TimescaleDB. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.

You also need to install Rails.

Connect Ruby to TimescaleDB

Step 1: Create a new Rails application

Let’s start by creating a new Rails application configured to use PostgreSQL as the database. TimescaleDB is a PostgreSQL extension.

  1. rails new my_app -d=postgresql

Rails finishes creating and bundling your application, installing all required Gems in the process.

Step 2: Configure the TimescaleDB database

Locate your TimescaleDB credentials in order to connect to your TimescaleDB instance.

You’ll need the following credentials:

  • password
  • username
  • host URL
  • port
  • database name

In the default section of the config/database.yml section, configure your database:

  1. default: &default
  2. adapter: postgresql
  3. encoding: unicode
  4. # For details on connection pooling, see Rails configuration guide
  5. # http://guides.rubyonrails.org/configuring.html#database-pooling
  6. pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  7. host: [your hostname]
  8. port: [your port]
  9. username: [your username]
  10. password: [your password]
warning

Experienced Rails developers might want to set and retrieve environment variables for the username and password of the database. For the purposes of this quick start, we hard code the host, port, username, and password. This is not advised for code or databases of consequence.

Then configure the database name in the development, test, and production sections. Let’s call our database tsdb like so:

  1. development:
  2. <<: *default
  3. database: tsdb

Repeat the step for the test and production sections further down this file.

Your final file should look like this (without all the automatically generated comments):

  1. default: &default
  2. adapter: postgresql
  3. encoding: unicode
  4. # For details on connection pooling, see Rails configuration guide
  5. # http://guides.rubyonrails.org/configuring.html#database-pooling
  6. pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  7. host: [your hostname]
  8. port: [your port]
  9. username: [your username]
  10. password: [your password]
  11. development:
  12. <<: *default
  13. database: tsdb
  14. test:
  15. <<: *default
  16. database: tsdb
  17. production:
  18. <<: *default
  19. database: tsdb

Create the database

Now we can run the following rake command to create the database in TimescaleDB:

  1. rails db:create

This creates the tsdb database in your TimescaleDB instance and a schema.rb file that represents the state of your TimescaleDB database.

Create a relational table

Step 1: Add TimescaleDB to your Rails migration

First, let’s setup our database to include the TimescaleDB extension. Start by creating a migration:

  1. rails generate migration add_timescale

In your db/migrate project folder, you’ll see a new migration file for [some sequence of numbers]_add_timescale.rb. Replace the contents of that file with the following to instruct the database to load the TimescaleDB extension to PostgreSQL:

  1. class AddTimescale < ActiveRecord::Migration[5.2]
  2. def change
  3. enable_extension("timescaledb") unless extensions.include? "timescaledb"
  4. end
  5. end

Step 2: Run database migrations

Now we can run the following rails command to add the TimescaleDB extension to our database:

  1. rails db:migrate
warning

In order for the command to work, you need to make sure there is a database named postgres in your TimescaleDB deployment. This database is sometimes not present by default.

With rails dbconsole you can test that the extension has been added by running the \dx command:

  1. echo "\dx" | rails dbconsole

The output should be something like the following:

  1. List of installed extensions
  2. Name | Version | Schema | Description
  3. -------------+---------+------------+-------------------------------------------------------------------
  4. plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
  5. timescaledb | 2.1.1 | public | Enables scalable inserts and complex queries for time-series data
  6. (2 rows)
note

To ensure that your tests run successfully, add config.active_record.verify_foreign_keys_for_fixtures = false to your config/environments/test.rb file. Otherwise you get an error because TimescaleDB uses internal foreign keys.

Step 3: Create a table

Suppose we wanted to create a table to store the user agent (browser) and time whenever a visitor loads our page. You could easily extend this simple example to store a host of additional web analytics of interest to you. We can generate a Rails scaffold to represent this information in a table:

  1. rails generate scaffold PageLoads user_agent:string

TimescaleDB requires that any UNIQUE or PRIMARY KEY indexes on your table include all partitioning columns, which in our case is the time column. A new Rails model includes a PRIMARY KEY index for id by default, so we need to either remove the column or make sure that the index includes time as part of a “composite key.”

tip

Composite keys aren’t supported natively by Rails, but if you need to keep your id column around for some reason you can add support for them with the composite_primary_keys gem.

To satisfy this TimescaleDB requirement, we need to change the migration code to not create a PRIMARY KEY using the id column when create_table is used. To do this we can change the migration implementation:

  1. class CreatePageLoads < ActiveRecord::Migration[6.0]
  2. def change
  3. create_table :page_loads, id: false do |t|
  4. t.string :user_agent
  5. t.timestamps
  6. end
  7. end
  8. end

Rails generates all the helper files and a database migration. We can then run a rails db:migrate command again to create the table in our database.

  1. rails db:migrate

Now, we can confirm that our table exists using and the model is properly mapped using a simple rails runner command:

  1. rails runner 'p PageLoad.count'
  2. 0

And we can view the structure of the page_loads table combining the \d page_loads command in the rails dbconsole output:

  1. echo "\d page_loads" | rails dbconsole
  2. Table "public.page_loads"
  3. Column | Type | Collation | Nullable | Default
  4. ------------+--------------------------------+-----------+----------+---------
  5. user_agent | character varying | | |
  6. created_at | timestamp(6) without time zone | | not null |
  7. updated_at | timestamp(6) without time zone | | not null |

Generate hypertable

In TimescaleDB, the primary point of interaction with your data is a hypertable, the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL.

Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable.

A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value.

tip

The TimescaleDB documentation on schema management and indexing explains this in further detail.

Let’s create this migration to modify the page_loads database and create a hypertable by first running the following command:

  1. rails generate migration add_hypertable

In your db/migrate project folder, you’ll see a new migration file for [some sequence of numbers]_add_hypertable.

Then you can write the migration to add the hypertable:

  1. class AddHypertable < ActiveRecord::Migration[5.2]
  2. def change
  3. execute "SELECT create_hypertable('page_loads', 'created_at');"
  4. end
  5. end

Run rails db:migrate to generate the hypertable.

We can confirm this in psql by running the \d page_loads command and seeing the following:

  1. Table "public.page_loads"
  2. Column | Type | Collation | Nullable | Default
  3. ------------+-----------------------------+-----------+----------+---------
  4. user_agent | character varying | | |
  5. created_at | timestamp without time zone | | not null |
  6. updated_at | timestamp without time zone | | not null |
  7. Indexes:
  8. "page_loads_created_at_idx" btree (created_at DESC)
  9. Triggers:
  10. ts_insert_blocker BEFORE INSERT ON page_loads FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()

Insert rows into TimescaleDB

Create a new view and controller so that we can insert a value into the database and see our results. When the view displays, you can store the user agent and time into the database.

  1. rails generate controller static_pages home

This generates the view and controller files for a page called /static_pages/home in our site. Let’s first add a line to the static_pages_controller.rb file to retrieve the user agent of the site visitor’s browser:

  1. class StaticPagesController < ApplicationController
  2. def home
  3. @agent = request.user_agent
  4. end
  5. end

Subsequently, in the home.html.erb file, print the @agent variable you just created:

  1. <h1>StaticPages#home</h1>
  2. <p>Find me in app/views/static_pages/home.html.erb</p>
  3. <p>
  4. Request: <%= @agent %>
  5. </p>

Start your Rails server on the command line:

  1. rails s

And, in your browser, visit http://localhost:3000/static_pages/home. You should see a printout of the user agent for your browser.

Now that we’ve successfully obtained the user agent and passed it as a variable to the view, we can create a PageLoad object, store the user agent information and time, and save the object to our TimescaleDB database. In the static_pages_controller.rb controller file, add the following:

  1. class StaticPagesController < ApplicationController
  2. def home
  3. PageLoad.create(user_agent: request.user_agent)
  4. end
  5. end

Go back to your browser and refresh the page several times. You should see commit messages in your Rails console window, like so:

  1. Started GET "/static_pages/home" for ::1 at 2020-04-15 14:02:18 -0700
  2. Processing by StaticPagesController#home as HTML
  3. (79.5ms) BEGIN
  4. app/controllers/static_pages_controller.rb:6
  5. PageLoad Create (79.9ms) INSERT INTO "page_loads" ("user_agent", "created_at", "updated_at") VALUES ($1, $2, $3) [["user_agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15"], ["created_at", "2020-04-15 21:02:18.187643"], ["updated_at", "2020-04-15 21:02:18.187643"]]
  6. app/controllers/static_pages_controller.rb:6
  7. (80.0ms) COMMIT
  8. app/controllers/static_pages_controller.rb:6
  9. Rendering static_pages/home.html.erb within layouts/application
  10. Rendered static_pages/home.html.erb within layouts/application (0.5ms)
  11. Completed 200 OK in 266ms (Views: 20.9ms | ActiveRecord: 239.4ms)

You can view these entries in TimescaleDB by running the following command in psql:

  1. SELECT * FROM page_loads ORDER BY created_at DESC;

The output should look like this:

  1. user_agent | created_at | updated_at
  2. -----------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
  3. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:18.187643 | 2020-04-15 21:02:18.187643
  4. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:17.404137 | 2020-04-15 21:02:17.404137
  5. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:14.82468 | 2020-04-15 21:02:14.82468
  6. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:12.957934 | 2020-04-15 21:02:12.957934
  7. (4 rows)

Execute a query

So far, we’ve created a TimescaleDB table and inserted data into it. Now, let’s retrieve data and display it.

In our static_pages_controller.rb file let’s modify the home method and use Active Record to query all items in the page_load database and store them in an array:

  1. class StaticPagesController < ApplicationController
  2. def home
  3. PageLoad.create(:user_agent => request.user_agent)
  4. end
  5. end

And we can modify our home.html.erb view to iterate over the array and display each item:

  1. <h1>Static Pages requests: <%= PageLoad.count %> </h1>

Now, each time we refresh our page, we can see that a record is being inserted into the tsdb TimescaleDB database, and the counter is incremented on the page.

Generating requests

We need to have a lot of page loads to continue our research and explore the time_bucket function.

Let’s use Apache Bench aka ab to request 50,000 times parallelizing 10 times.

  1. ab -n 50000 -c 10 http://localhost:3000/static_pages/home

Now, you can grab a tea and relax while it creates thousands of records in your first hypertable. You’ll be able to count how many ‘empty requests’ your Rails supports.

Counting requests per minute

Once the ab command begins running, we can start a rails console and try some queries using the time_bucket function.

  1. rails console

Now, let’s start counting how many requests we have per minute:

  1. PageLoad
  2. .select("time_bucket('1 minute', created_at) as time, count(1) as total")
  3. .group('time').order('time')
  4. .map {|result| [result.time, result.total]}
  5. # => [
  6. # [2021-04-14 20:38:00 UTC, 11770],
  7. # [2021-04-14 20:39:00 UTC, 11668], ...]

It works! Now, let’s create some useful scopes that can help to summarize and easily access the time_bucket function:

Creating scopes to reuse

Scopes are very useful for decomposing complex SQL statements into Ruby objects. It also allow to introduce params and reuse queries as you need.

Examples of scopes:

  1. class PageLoad < ApplicationRecord
  2. scope :last_month, -> { where('created_at > ?', 1.month.ago) }
  3. scope :last_week, -> { where('created_at > ?', 1.week.ago) }
  4. scope :last_hour, -> { where('created_at > ?', 1.hour.ago) }
  5. scope :yesterday, -> { where('DATE(created_at) = ?', 1.day.ago.to_date) }
  6. scope :today, -> { where('DATE(created_at) = ?', Date.today) }
  7. end

And you can also combine the scopes with other ActiveRecord methods:

  1. PageLoad.last_week.count # Total of requests from last week
  2. PageLoad.last_hour.first # First request from last hour
  3. PageLoad.last_hour.all # All requests from last hour
  4. PageLoad.last_hour.limit(10) # 10 requests from last hour
  5. # Count chrome users from last hour
  6. PageLoad.last_hour.where("user_agent ilikes '%Chrome%'").count

Now, let’s introduce a scope that counts per time dimension:

  1. class PageLoad < ApplicationRecord
  2. scope :counts_per, -> (time_dimension) {
  3. select("time_bucket('#{time_dimension}', created_at) as time, count(1) as total")
  4. .group(:time).order(:time)
  5. .map {|result| [result.time, result.total]}
  6. }
  7. end

Exploring other time frames:

  1. PageLoad.counts_per('1 hour')
  2. # PageLoad Load (1037.3ms) SELECT time_bucket('1 hour', created_at) as time, count(1) as total FROM "page_loads" WHERE (created_at > '2021-04-08 12:03:14.800902') GROUP BY time ORDER BY time
  3. # => [[2021-04-14 21:00:00 UTC, 185836],
  4. # [2021-04-14 22:00:00 UTC, 155286], ... ]

Add performance and path attributes to PageLoad

Let’s get deeper in requests, moving our example to watch all server requests and store the endpoint path and the time necessary to return the response.

First, we need to add columns to the database using rails migrations:

  1. rails g migration add_performance_to_page_load path:string performance:float

The Rails generator is smart enough to understand the naming convention of the migration and the extra params to suggest a code like this:

  1. class AddPerformanceToPageLoad < ActiveRecord::Migration[6.0]
  2. def change
  3. add_column :page_loads, :path, :string
  4. add_column :page_loads, :performance, :float
  5. end
  6. end

And, now we can run migrations with rails db:migrate to get the two columns in the database.

Hooking application controller to collect performance data

Our next step is make the PageLoad record creation happen in any request happening in the system. So, let’s hook the application controller with some around_action hook.

  1. class ApplicationController < ActionController::Base
  2. around_action do |controller, action|
  3. performance = Benchmark.measure(&action.method(:call))
  4. PageLoad.create(path: request.path,
  5. performance: performance.real,
  6. user_agent: request.user_agent)
  7. end
  8. end

We’re using only the real performance from benchmark but you can collect additional metrics to see more details about your system.

You can refresh the page and check the latest record in the rails console:

  1. PageLoad.order(:created_at).last
  2. # PageLoad Load (1.7ms) SELECT "page_loads".* FROM "page_loads" ORDER BY "page_loads"."created_at" DESC LIMIT $1 [["LIMIT", 1]]
  3. # => #<PageLoad:0x00007fdafc5c69d8 path: "/static_pages/home", performance: 0.049275, ...>

Exploring aggregation functions

Now that we know what pages exist, we can explore page by page (or all the pages together), grouping by path or not.

  1. class PageLoad < ApplicationRecord
  2. scope :time_bucket, -> (time_dimension, value: 'count(1)') {
  3. select(<<~SQL)
  4. time_bucket('#{time_dimension}', created_at) as time, path,
  5. #{value} as value
  6. SQL
  7. .group('time, path').order('path, time')
  8. }
  9. end

And we can build scopes reusing previous scopes to have easy names for the most used queries:

  1. scope :per_minute, -> { time_bucket('1 minute') }
  2. scope :per_hour, -> { time_bucket('1 hour') }
  3. scope :per_day, -> { time_bucket('1 day') }
  4. scope :per_week, -> { time_bucket('1 week') }
  5. scope :per_month, -> { time_bucket('1 month') }

Create some average response depending on the timeframe:

  1. scope :average_response_time_per_minute, -> { time_bucket('1 minute', value: 'avg(performance)') }
  2. scope :average_response_time_per_hour, -> { time_bucket('1 hour', value: 'avg(performance)') }

And also, understand the limits max and min of the requests:

  1. scope :worst_response_time_last_minute, -> { time_bucket('1 minute', value: 'max(performance)') }
  2. scope :worst_response_time_last_hour, -> { time_bucket('1 hour', value: 'max(performance)') }
  3. scope :best_response_time_last_hour, -> { time_bucket('1 hour', value: 'min(performance)') }

Finally, let’s build some useful method that can create a resume for every different path. So, first step is collect unique paths from the page loads:

  1. scope :paths, -> { distinct.pluck(:path) }

Testing on Ruby console we have:

  1. PageLoad.paths # => ["/page_loads/new", "/static_pages/home"]
  2. # (151.6ms) SELECT DISTINCT "page_loads"."path" FROM "page_loads"

Now, let’s have a look in the actual metrics we generate for the response time filtering by methods that contains response_time.

  1. PageLoad.methods.grep /response_time/
  2. # => [:average_response_time_per_minute,
  3. # :worst_response_time_last_minute,
  4. # :average_response_time_per_hour,
  5. # :worst_response_time_last_hour,
  6. # :best_response_time_last_hour]

Now, it’s time to build our summary based on every single page:

  1. def self.resume_for(path)
  2. filter = where(path: path)
  3. get = -> (scope_name) { filter.send(scope_name).first.value}
  4. metrics.each_with_object({}) do |metric, resume|
  5. resume[metric] = get[metric]
  6. end
  7. end
  8. def self.metrics
  9. methods.grep /response_time/
  10. end

Trying it on console:

  1. PageLoad.resume_for("/page_loads/new")
  2. # => {:average_response_time_per_minute=>0.967591333319433,
  3. # :worst_response_time_last_minute=>2.892941999947652,
  4. # :average_response_time_per_hour=>0.48624183332625154,
  5. # :worst_response_time_last_hour=>2.892941999947652,
  6. # :best_response_time_last_hour=>0.0030219999607652426}

And you can keep combining other filters like:

  1. PageLoad.last_week.resume_for("/page_loads/new")
  2. PageLoad.yesterday.resume_for("/page_loads/new")
  3. PageLoad.today.resume_for("/page_loads/new")

The last step is to recursively navigate into all of the pages and build a summary for each:

  1. def self.statistics
  2. paths.each_with_object({}) do |path, resume|
  3. resume[path] = resume_for(path)
  4. end
  5. end

And now, testing again on console:

  1. PageLoad.statistics
  2. # => {"/page_loads/new"=>
  3. # {:average_response_time_per_minute=>0.967591333319433,
  4. # :worst_response_time_last_minute=>2.892941999947652,
  5. # :average_response_time_per_hour=>0.48624183332625154,
  6. # :worst_response_time_last_hour=>2.892941999947652,
  7. # :best_response_time_last_hour=>0.0030219999607652426},
  8. # "/static_pages/home"=>
  9. # {:average_response_time_per_minute=> ...}

And you can also explore different data frames for the statistics:

  1. PageLoad.yesterday.statistics # => {...}
  2. PageLoad.last_week.statistics # => {...}

As you can see in the console, every single query is being executed independent, which is suboptimal but covers different options.

Now that you get some basics of the TimescaleDB instance from your Rails application, be sure to check out these advanced TimescaleDB tutorials: