Data Source 和 Sink 的容错保证

Flink’s fault tolerance mechanism recovers programs in the presence of failures andcontinues to execute them. Such failures include machine hardware failures, network failures,transient program failures, etc.

Flink can guarantee exactly-once state updates to user-defined state only when the source participates in thesnapshotting mechanism. The following table lists the state update guarantees of Flink coupled with the bundled connectors.

Please read the documentation of each connector to understand the details of the fault tolerance guarantees.

SourceGuaranteesNotes
Apache Kafkaexactly onceUse the appropriate Kafka connector for your version
AWS Kinesis Streamsexactly once
RabbitMQat most once (v 0.10) / exactly once (v 1.0)
Twitter Streaming APIat most once
Collectionsexactly once
Filesexactly once
Socketsat most once

To guarantee end-to-end exactly-once record delivery (in addition to exactly-once state semantics), the data sink needsto take part in the checkpointing mechanism. The following table lists the delivery guarantees (assuming exactly-oncestate updates) of Flink coupled with bundled sinks:

SinkGuaranteesNotes
HDFS BucketingSinkexactly onceImplementation depends on Hadoop version
Elasticsearchat least once
Kafka producerat least once/ exactly onceexactly once with transactional producers (v 0.11+)
Cassandra sinkat least once / exactly onceexactly once only for idempotent updates
AWS Kinesis Streamsat least once
File sinksat least once
Socket sinksat least once
Standard outputat least once
Redis sinkat least once