IoT Fleet Management - Spark and Kafka

IoT Fleet Management - Spark and Kafka

AttentionThis page documents an earlier version. Go to the latest (v2.1)version.

Overview

This is an end-to-end functional application. It is a blueprint for an IoT application built on top of YugabyteDB (Cassandra API) as the database, Kafka as the message broker, Spark for realtime analytics and Spring Boot as the application framework. The stack used for this application is very similar to the SMACK stack (Spark, Mesos, Akka, YugabyteDB, Kafka), which is a popular stack for developing IoT applications.

Scenario

Assume that a fleet management company wants to track their fleet of vehicles which are delivering shipments. The vehicles performing the shipments are of different types (18 Wheelers, busses, large trucks, etc), and the shipments themselves happen over 3 routes (Route-37, Route-82, Route-43). The company wants to track:

the breakdown of their vehicle types per shipment delivery route
which vehicles are near road closures so that they can predict delays in deliveries

This app renders a dashboard showing both of the above. Below is a view of the realtime, auto-refreshing dashboard.

App Architecture

This application has the following subcomponents:

Data Store - Yugabyte
Data Producer - Test program writing into Kafka
Data Processor - Spark reading from Kafka
Data Dashboard - Spring Boot app using web sockets, jQuery and bootstrap

We will look at each of these components in detail. Below is an architecture diagram showing how these components fit together.

Data Store

Stores all the user-facing data. YugabyteDB is used here, with CQL as the programming language.

All the data is stored in the keyspace TrafficKeySpace:

CREATE KEYSPACE IF NOT EXISTS TrafficKeySpace

There are three tables that hold the user-facing data - Total_Traffic for the lifetime traffic information, Window_Traffic for the last 30 seconds of traffic and poi_traffic for the traffic near a point of interest (road closures). The data processor constantly updates these tables, and the dashboard reads from these tables. Below are the schemas for these tables.

CREATE TABLE TrafficKeySpace.Total_Traffic (
    routeId text,
    vehicleType text,
    totalCount bigint,
    timeStamp timestamp,
    recordDate text,
    PRIMARY KEY (routeId, recordDate, vehicleType)
);
CREATE TABLE TrafficKeySpace.Window_Traffic (
    routeId text,
    vehicleType text,
    totalCount bigint,
    timeStamp timestamp,
    recordDate text,
    PRIMARY KEY (routeId, recordDate, vehicleType)
);
CREATE TABLE TrafficKeySpace.poi_traffic(
    vehicleid text,
    vehicletype text,
    distance bigint,
    timeStamp timestamp,
    PRIMARY KEY (vehicleid)
);

Data Producer

A program that generates random test data and publishes it to the Kafka topic iot-data-event. This emulates the data received from the connected vehicles using a message broker in the real world.

A single data point is a JSON payload and looks as follows:

{
  "vehicleId":"0bf45cac-d1b8-4364-a906-980e1c2bdbcb",
  "vehicleType":"Taxi",
  "routeId":"Route-37",
  "longitude":"-95.255615",
  "latitude":"33.49808",
  "timestamp":"2017-10-16 12:31:03",
  "speed":49.0,
  "fuelLevel":38.0
}

Data Processor

This is a Spark streaming application that consumes the data stream from the Kafka topic, converts them into meaningful insights and writes the resultant data back to YugabyteDB.

Spark communicates with YugabyteDB using the Cassandra connector. This is done as follows:

SparkConf conf =
  new SparkConf().setAppName(prop.getProperty("com.iot.app.spark.app.name"))
                 .set("spark.cassandra.connection.host",prop.getProperty("com.iot.app.cassandra.host"))

The data is consumed from a Kafka stream and collected in 5 second batches. This is achieved as follows:

JavaStreamingContext jssc = new JavaStreamingContext(conf,Durations.seconds(5));
JavaPairInputDStream<String, IoTData> directKafkaStream =
  KafkaUtils.createDirectStream(jssc,
                                String.class,
                                IoTData.class,
                                StringDecoder.class,
                                IoTDataDecoder.class,
                                kafkaParams,
                                topicsSet
                                );

It computes the following:

Compute a breakdown by vehicle type and the shipment route across all the vehicles and shipments done so far
Compute the above breakdown for active shipments. This is done by computing the breakdown by vehicle type and shipment route for the last 30 seconds
Detect the vehicles which are within a 20 mile radius of a given Point of Interest (POI), which represents a road-closure

Data Dashboard

This is a Spring Boot application which queries the data from Yugabyte and pushes the data to the webpage using Web Sockets and jQuery. The data is pushed to the web page in fixed intervals so data will be refreshed automatically. Dashboard displays data in charts and tables. This web page uses bootstrap.js to display the dashboard containing charts and tables.

We create entity classes for the three tables “Total_Traffic”, “Window_Traffic” and “Poi_Traffic”, and DAO interfaces for all the entities extending CassandraRepository. For example, we create the DAO class for TotalTrafficData entity as follows.

@Repository
public interface TotalTrafficDataRepository extends CassandraRepository<TotalTrafficData> {
  @Query("SELECT * FROM traffickeyspace.total_traffic WHERE recorddate = ? ALLOW FILTERING")
  Iterable<TotalTrafficData> findTrafficDataByDate(String date);   
}

In order to connect to Yugabyte cluster and get connection for database operations, we write the assandraConfig class. This is done as follows:

public class CassandraConfig extends AbstractCassandraConfiguration {
  @Bean
  public CassandraClusterFactoryBean cluster() {
    // Create a Cassandra cluster to access Yugabyte using CQL.
    CassandraClusterFactoryBean cluster = new CassandraClusterFactoryBean();
    // Set the database host.
    cluster.setContactPoints(environment.getProperty("com.iot.app.cassandra.host"));
    // Set the database port.
    cluster.setPort(Integer.parseInt(environment.getProperty("com.iot.app.cassandra.port")));
    return cluster;
  }
}

Summary

This application is a blue print for building IoT applications. The instructions to build and run the application, as well as the source code can be found in this github repo.