How to simulate a basic IoT sensor dataset on PostgreSQL or TimescaleDB

The Internet of Things (IoT) describes a trend where computing is becoming ubiquitous and is embedded in more and more physical things. For many of these things, the purpose of IoT is to collect sensor data about the environment in which it exists, for example oil wells, factories, power plants, farms, moving vehicles, office buildings, homes.

In other words, IoT is all about the data. And the datasets generated by these things are generally time-series in nature, with relational metadata to describe those things.

Often, it is necessary to simulate IoT sensor data, for example, when testing a new system. This tutorial shows how to simulate a basic IoT sensor dataset on PostgreSQL, or TimescaleDB.

note

For creating a more advanced simulated IoT dataset, try the Time-series Benchmarking Suite (TSBS).

Prerequisites

To complete this tutorial, you need a cursory knowledge of the Structured Query Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you’ve seen SQL before.

To start, install TimescaleDB. When your installation is complete, you can ingest or create sample data and finish the tutorial.

Set up your tables

First, connect to your database via psql.

Second, create the “sensors” and “sensor_data” tables:

  1. CREATE TABLE sensors(
  2. id SERIAL PRIMARY KEY,
  3. type VARCHAR(50),
  4. location VARCHAR(50)
  5. );
  6. CREATE TABLE sensor_data (
  7. time TIMESTAMPTZ NOT NULL,
  8. sensor_id INTEGER,
  9. temperature DOUBLE PRECISION,
  10. cpu DOUBLE PRECISION,
  11. FOREIGN KEY (sensor_id) REFERENCES sensors (id)
  12. );

If you are using TimescaleDB, convert the sensor_data table into a hypertable:

  1. SELECT create_hypertable('sensor_data', 'time');

Fourth, populate the sensors table with 4 sensors:

  1. INSERT INTO sensors (type, location) VALUES
  2. ('a','floor'),
  3. ('a', 'ceiling'),
  4. ('b','floor'),
  5. ('b', 'ceiling');

Fifth, verify that the sensors were created correctly:

  1. SELECT * FROM sensors;

After running that last SQL statement, you should see something like this:

  1. id | type | location
  2. ----+------+----------
  3. 1 | a | floor
  4. 2 | a | ceiling
  5. 3 | b | floor
  6. 4 | b | ceiling
  7. (4 rows)

Create the simulated IoT sensor data

note

This section shows results of the queries as examples, but because the tutorial generates random data every time it is run, your results look different, but is structured the same way.

Generate a dataset for all of our four sensors and insert into the sensor_data table:

  1. INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
  2. SELECT
  3. time,
  4. sensor_id,
  5. random() AS cpu,
  6. random()*100 AS temperature
  7. FROM generate_series(now() - interval '24 hour', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);

Verify that the simulated sensor data was written correctly:

  1. SELECT * FROM sensor_data ORDER BY time;

Example output:

  1. time | sensor_id | temperature | cpu
  2. -------------------------------+-----------+--------------------+---------------------
  3. 2020-03-31 15:56:25.843575+00 | 1 | 6.86688972637057 | 0.682070567272604
  4. 2020-03-31 15:56:40.244287+00 | 2 | 26.589260622859 | 0.229583469685167
  5. 2030-03-31 15:56:45.653115+00 | 3 | 79.9925176426768 | 0.457779890391976
  6. 2020-03-31 15:56:53.560205+00 | 4 | 24.3201029952615 | 0.641885648947209
  7. 2020-03-31 16:01:25.843575+00 | 1 | 33.3203678019345 | 0.0159163917414844
  8. 2020-03-31 16:01:40.244287+00 | 2 | 31.2673618085682 | 0.701185956597328
  9. 2020-03-31 16:01:45.653115+00 | 3 | 85.2960689924657 | 0.693413889966905
  10. 2020-03-31 16:01:53.560205+00 | 4 | 79.4769988860935 | 0.360561791341752
  11. ...

Congratulations, you’ve created a basic IoT sensor dataset. Now let’s run some queries.

Run basic queries (optional)

This section requires TimescaleDB.

Average temperature, average cpu by 30 minute windows:

  1. SELECT
  2. time_bucket('30 minutes', time) AS period,
  3. AVG(temperature) AS avg_temp,
  4. AVG(cpu) AS avg_cpu
  5. FROM sensor_data
  6. GROUP BY period;

SAMPLE OUTPUT:

  1. period | avg_temp | avg_cpu
  2. ------------------------+------------------+-------------------
  3. 2020-03-31 19:00:00+00 | 49.6615830013373 | 0.477344429974134
  4. 2020-03-31 22:00:00+00 | 58.8521540844037 | 0.503637770501276
  5. 2020-03-31 16:00:00+00 | 50.4250325243144 | 0.511075591299838
  6. 2020-03-31 17:30:00+00 | 49.0742547437549 | 0.527267253802468
  7. 2020-04-01 14:30:00+00 | 49.3416377226822 | 0.438027751864865
  8. ...

Average & last temperature, average cpu by 30 minute windows:

But what if you don’t just want the average temperature for each period, but also the last temperature? For example if you wanted to understand the final temperature value at the end of the interval:

  1. SELECT
  2. time_bucket('30 minutes', time) AS period,
  3. AVG(temperature) AS avg_temp,
  4. last(temperature, time) AS last_temp,
  5. AVG(cpu) AS avg_cpu
  6. FROM sensor_data
  7. GROUP BY period;

Example output:

  1. period | avg_temp | last_temp | avg_cpu
  2. ------------------------+------------------+------------------+-------------------
  3. 2020-03-31 19:00:00+00 | 49.6615830013373 | 84.3963081017137 | 0.477344429974134
  4. 2020-03-31 22:00:00+00 | 58.8521540844037 | 76.5528806950897 | 0.503637770501276
  5. 2020-03-31 16:00:00+00 | 50.4250325243144 | 43.5192013625056 | 0.511075591299838
  6. 2020-03-31 17:30:00+00 | 49.0742547437549 | 22.740753274411 | 0.527267253802468
  7. 2020-04-01 14:30:00+00 | 49.3416377226822 | 59.1331578791142 | 0.438027751864865
  8. ...

Using the sensor metadata

Now let’s take advantage of some of the metadata you have stored in the sensors table:

  1. SELECT
  2. sensors.location,
  3. time_bucket('30 minutes', time) AS period,
  4. AVG(temperature) AS avg_temp,
  5. last(temperature, time) AS last_temp,
  6. AVG(cpu) AS avg_cpu
  7. FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
  8. GROUP BY period, sensors.location;

Example output:

  1. location | period | avg_temp | last_temp | avg_cpu
  2. ----------+------------------------+------------------+-------------------+-------------------
  3. ceiling | 20120-03-31 15:30:00+00 | 25.4546818090603 | 24.3201029952615 | 0.435734559316188
  4. floor | 2020-03-31 15:30:00+00 | 43.4297036845237 | 79.9925176426768 | 0.56992522883229
  5. ceiling | 2020-03-31 16:00:00+00 | 53.8454438598516 | 43.5192013625056 | 0.490728285357666
  6. floor | 2020-03-31 16:00:00+00 | 47.0046211887772 | 23.0230117216706 | 0.53142289724201
  7. ceiling | 2020-03-31 16:30:00+00 | 58.7817596504465 | 63.6621567420661 | 0.488188337767497
  8. floor | 2020-03-31 16:30:00+00 | 44.611586847653 | 2.21919436007738 | 0.434762630766879
  9. ceiling | 2020-03-31 17:00:00+00 | 35.7026890735142 | 42.9420990403742 | 0.550129583687522
  10. floor | 2020-03-31 17:00:00+00 | 62.2794370166957 | 52.6636955793947 | 0.454323202022351
  11. ...

Next steps

Congratulations, you now have a basic IoT sensor dataset you can use for testing in PostgreSQL or TimescaleDB.

To learn more about TimescaleDB, or about the TimescaleDB concepts and functions you just used, please visit these pages in our developer documentation: