Architecture

Data flow model

An Event is a unit of data that flows through a Flume agent. The Eventflows from Source to Channel to Sink, and is represented by animplementation of the Event interface. An Event carries a payload (bytearray) that is accompanied by an optional set of headers (string attributes).A Flume agent is a process (JVM) that hosts the components that allowEvents to flow from an external source to a external destination.

Agent component diagram

A Source consumes Events having a specific format, and thoseEvents are delivered to the Source by an external source like a webserver. For example, an AvroSource can be used to receive Avro Eventsfrom clients or from other Flume agents in the flow. When a Source receivesan Event, it stores it into one or more Channels. The Channel isa passive store that holds the Event until that Event is consumed by aSink. One type of Channel available in Flume is the FileChannelwhich uses the local filesystem as its backing store. A Sink is responsiblefor removing an Event from the Channel and putting it into an externalrepository like HDFS (in the case of an HDFSEventSink) or forwarding it tothe Source at the next hop of the flow. The Source and Sink withinthe given agent run asynchronously with the Events staged in theChannel.

Reliability

An Event is staged in a Flume agent’s Channel. Then it’s theSink‘s responsibility to deliver the Event to the next agent orterminal repository (like HDFS) in the flow. The Sink removes an Eventfrom the Channel only after the Event is stored into the Channel ofthe next agent or stored in the terminal repository. This is how the single-hopmessage delivery semantics in Flume provide end-to-end reliability of the flow.Flume uses a transactional approach to guarantee the reliable delivery of theEvents. The Sources and Sinks encapsulate thestorage/retrieval of the Events in a Transaction provided by theChannel. This ensures that the set of Events are reliably passed frompoint to point in the flow. In the case of a multi-hop flow, the Sink fromthe previous hop and the Source of the next hop both have theirTransactions open to ensure that the Event data is safely stored inthe Channel of the next hop.