Kafka

While the previous example picks up new log files right away - the log
files aren’t copied over until a long time after the HTTP requests in the logs
actually occurred. While that enables auto-refresh of log data, that’s still not realtime. To get realtime
logs processing, we need a way to send over log lines immediately. Kafka is a
high-throughput distributed message system that is perfect for that use case. Spark
contains an external module importing data from Kafka.

Here is some useful documentation to set up Kafka for Spark Streaming: