Deployment

Currently inlong-sort is based on flink, before you run an inlong-sort application, you need to set up flink environment.

Currently, inlong-sort relys on flink-1.13.5. Chose flink-1.13.5-bin-scala_2.11.tgz when downloading package.

Once your flink environment is set up, you can visit web ui of flink, whose address is stored in /${your_flink_path}/conf/masters.

Prepare installation files

All installation files at inlong-sort directory.

Starting an inlong-sort application

Now you can submit job to flink with the jar compiled, refer to how to submit job to flink.

Example:

  1. ./bin/flink run -c org.apache.inlong.sort.flink.Entrance inlong-sort/sort-core-[version].jar \
  2. --cluster-id inlong_app --zookeeper.quorum 127.0.0.1:2181 --zookeeper.path.root /inlong_sort \
  3. --source.type tubemq --metrics.audit.proxy.hosts 127.0.0.1:10081 --sink.type hive

Notice:

  • -c org.apache.inlong.sort.flink.Entrance is the main class name

  • inlong-sort/sort-core-[version].jar is the compiled jar

Necessary configurations

  • --cluster-id represent a specified inlong-sort application, same as the configuration of sort.appName in inlong-manager
  • --zookeeper.quorum zk quorum, same as the configuration of cluster.zk.url in inlong-manager
  • --zookeeper.path.root zk root path, same as the configuration of cluster.zk.root in inlong-manager
  • --source.type source of the application, currently “tubemq” and “pulsar” are supported
  • --metrics.audit.proxy.hosts audit proxy host address for reporting audit metrics
  • --sink.type sink of the application, currently “clickhouse” and “hive” are supported

Example

  1. --cluster-id inlong_app --zookeeper.quorum 192.127.0.1:2181 \
  2. --zookeeper.path.root /inlong_sort --source.type tubemq --sink.type hive

All configurations

namenecessarydefault valuedescription
cluster-idYNAused to represent a specified inlong-sort application
zookeeper.quorumYNAzk quorum
zookeeper.path.rootY/inlong-sortzk root path
source.typeYNAsource of the application, currently “tubemq” and “pulsar” are supported
sink.typeYNAsink of the application, currently “clickhouse” and “hive” are supported
source.parallelismN1parallelism of source
deserialization.parallelismN1parallelism of deserialization
sink.parallelismN1parallelism of sink
tubemq.master.addressNNAtube master address used if absent in DataFlowInfo on zk
tubemq.session.keyNinlong-sortsession key used when subscribing to tubemq
tubemq.bootstrap.from.maxNfalsewhether consume from max or not when subscribing to tubemq
tubemq.message.not.found.wait.periodN350msThe time of waiting period if tube broker return message not found
tubemq.subscribe.retry.timeoutN300000The time of subscribing tube timeout, in millisecond
zookeeper.client.session-timeoutN60000The session timeout for the ZooKeeper session in ms
zookeeper.client.connection-timeoutN15000The connection timeout for ZooKeeper in ms
zookeeper.client.retry-waitN5000The pause between consecutive retries in ms
zookeeper.client.max-retry-attemptsN3The number of connection retries before the client gives up
zookeeper.client.aclN“open”Defines the ACL (open/creator) to be configured on ZK node. The configuration value can be set to “creator” if the ZooKeeper server configuration has the “authProvider” property mapped to use SASLAuthenticationProvider and the cluster is configured to run in secure mode (Kerberos)
zookeeper.sasl.disableNfalseWhether disable zk sasl or not