安装部署

准备安装文件

安装文件在inlong-sort-standalone/sort-standalone-dist/target/目录下,文件名是apache-inlong-sort-standalone-${project.version}-bin.tar.gz。

启动inlong-sort-standalone应用

有了上述编译阶段产出的tar.gz包后,解压后就可以启动inlong-sort-standalone的应用了。
示例:

  1. ./bin/sort-start.sh

conf/common.properties配置

配置名是否必须默认值描述
clusterIdYNA用来唯一标识一个inlong-sort-standalone集群
sortSource.typeNorg.apache.inlong.sort.standalone.source.readapi.ReadApiSourceSource类名
sortChannel.typeNorg.apache.inlong.sort.standalone.channel.BufferQueueChannelChannel类型
sortSink.typeNorg.apache.inlong.sort.standalone.sink.hive.HiveSinkSink类名,不同的分发类型使用不同的Sink类
sortClusterConfig.typeNorg.apache.inlong.sort.standalone.config.loader.ClassResourceSortClusterConfigLoader分发集群配置加载类名,ClassResourceSortClusterConfigLoader从ClassPath的SortClusterConfig.conf源文件读取分发集群配置
sortClusterConfig.managerPathNNA分发集群配置加载类org.apache.inlong.sort.standalone.config.loader.ManagerSortClusterConfigLoader的参数,指定Inlong Manager的URL路径, 如http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig
eventFormatHandlerNorg.apache.inlong.sort.standalone.sink.hive.DefaultEventFormatHandler分发Hive前的格式转换类名
maxThreadsN10sink的并行度
reloadIntervalN60000分发集群配置的更新加载周期,单位毫秒
processIntervalN100分发分组处理间隔,单位毫秒
metricDomainsNSort指标汇总域名
metricDomains.Sort.domainListenersNorg.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener指标汇总监听器类名列表,空格分隔
prometheusHttpPortN8080org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener的参数,Prometheus的HttpServer端口
metricDomains.Sort.snapshotIntervalN60000订阅tube的重试超时时间,单位为ms

SortClusterConfig配置

  • 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新
  • 可以从Inlong Manager的HTTP接口获取配置
    | 配置名 | 是否必须 | 默认值 |描述 | | —————— | —————— | —————— | —————— | |clusterName | Y | NA | 用来唯一标识一个inlong-sort-standalone集群 | |sortTasks | Y | NA | 分发任务列表 |

SortTaskConfig配置

配置名是否必须默认值描述
nameYNA分发任务名
typeYNA分发任务类型,如HIVE(“hive”), TUBE(“tube”), KAFKA(“kafka”), PULSAR(“pulsar”), ElasticSearch(“ElasticSearch”), UNKNOWN(“n”)
idParamsYNAInlong数据流参数列表
sinkParamsYNA分发任务的参数

Hive分发任务的idParams

配置名是否必须默认值描述
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
separatorYNA分隔符
partitionIntervalMsN3600000分区间隔时间,单位毫秒
idRootPathYNAInlong数据流的Hdfs根目录
partitionSubPathYNAInlong数据流的分区子目录
hiveTableNameYNAInlong数据流的Hive表名
partitionFieldNameNdtInlong数据流的分区字段名
partitionFieldPatternYNAInlong数据流的分区字段值格式,如{yyyyMMdd}、{yyyyMMddHH}、{yyyyMMddHHmm}
msgTimeFieldPatternYNA消息生成时间的字段值格式,Java时间格式
maxPartitionOpenDelayHourN8分区最大打开延迟时间,单位小时

Hive分发任务的sinkParams

配置名是否必须默认值描述
hdfsPathYNAHDFS的NameNode
maxFileOpenDelayMinuteN5单个HDFS文件最大写入时间,单位分钟
tokenOvertimeMinuteN60单个Inlong数据流的分区创建token最大占用时间,单位分钟
maxOutputFileSizeGbN2单个HDFS文件最大大小,单位GB
hiveJdbcUrlYNAHive的JDBC路径
hiveDatabaseYNAHive的数据库
hiveUsernameYNAHive的用户名
hivePasswordYNAHive的密码

Pulsar分发任务的idParams

配置名是否必须默认值描述
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
topicYNAPulsar的Topic

Pulsar分发任务的sinkParams

配置名是否必须默认值描述
serviceUrlYNAPulsar服务路径
authenticationYNAPulsar集群鉴权
enableBatchingNtrueenableBatching
batchingMaxBytesN5242880batchingMaxBytes
batchingMaxMessagesN3000batchingMaxMessages
batchingMaxPublishDelayN1batchingMaxPublishDelay
maxPendingMessagesN1000maxPendingMessages
maxPendingMessagesAcrossPartitionsN50000maxPendingMessagesAcrossPartitions
sendTimeoutN0sendTimeout
compressionTypeNNONEcompressionType
blockIfQueueFullNtrueblockIfQueueFull
roundRobinRouterBatchingPartitionSwitchFrequencyN10roundRobinRouterBatchingPartitionSwitchFrequency

Hive配置样例

  1. {
  2. "data":{
  3. "clusterName":"hivev3-sz-sz1",
  4. "sortTasks":[
  5. {
  6. "idParams":[
  7. {
  8. "inlongGroupId":"0fc00000046",
  9. "inlongStreamId":"",
  10. "separator":"|",
  11. "partitionIntervalMs":3600000,
  12. "idRootPath":"/user/hive/warehouse/t_inlong_v1_0fc00000046",
  13. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  14. "hiveTableName":"t_inlong_v1_0fc00000046",
  15. "partitionFieldName":"dt",
  16. "partitionFieldPattern":"yyyyMMddHH",
  17. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  18. "maxPartitionOpenDelayHour":8
  19. },
  20. {
  21. "inlongGroupId":"03600000045",
  22. "inlongStreamId":"",
  23. "separator":"|",
  24. "partitionIntervalMs":3600000,
  25. "idRootPath":"/user/hive/warehouse/t_inlong_v1_03600000045",
  26. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  27. "hiveTableName":"t_inlong_v1_03600000045",
  28. "partitionFieldName":"dt",
  29. "partitionFieldPattern":"yyyyMMddHH",
  30. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  31. "maxPartitionOpenDelayHour":8
  32. },
  33. {
  34. "inlongGroupId":"05100054990",
  35. "inlongStreamId":"",
  36. "separator":"|",
  37. "partitionIntervalMs":3600000,
  38. "idRootPath":"/user/hive/warehouse/t_inlong_v1_05100054990",
  39. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  40. "hiveTableName":"t_inlong_v1_05100054990",
  41. "partitionFieldName":"dt",
  42. "partitionFieldPattern":"yyyyMMddHH",
  43. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  44. "maxPartitionOpenDelayHour":8
  45. },
  46. {
  47. "inlongGroupId":"09c00014434",
  48. "inlongStreamId":"",
  49. "separator":"|",
  50. "partitionIntervalMs":3600000,
  51. "idRootPath":"/user/hive/warehouse/t_inlong_v1_09c00014434",
  52. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  53. "hiveTableName":"t_inlong_v1_09c00014434",
  54. "partitionFieldName":"dt",
  55. "partitionFieldPattern":"yyyyMMddHH",
  56. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  57. "maxPartitionOpenDelayHour":8
  58. },
  59. {
  60. "inlongGroupId":"0c900035509",
  61. "inlongStreamId":"",
  62. "separator":"|",
  63. "partitionIntervalMs":3600000,
  64. "idRootPath":"/user/hive/warehouse/t_inlong_v1_0c900035509",
  65. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  66. "hiveTableName":"t_inlong_v1_0c900035509",
  67. "partitionFieldName":"dt",
  68. "partitionFieldPattern":"yyyyMMddHH",
  69. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  70. "maxPartitionOpenDelayHour":8
  71. }
  72. ],
  73. "name":"sid_hive_inlong6th_v3",
  74. "sinkParams":{
  75. "hdfsPath":"hdfs://127.0.0.1:9000",
  76. "maxFileOpenDelayMinute":"5",
  77. "tokenOvertimeMinute":"60",
  78. "maxOutputFileSizeGb":"2",
  79. "hiveJdbcUrl":"jdbc:hive2://127.0.0.2:10000",
  80. "hiveDatabase":"default",
  81. "hiveUsername":"hive",
  82. "hivePassword":"hive"
  83. },
  84. "type":"HIVE"
  85. }
  86. ]
  87. },
  88. "errCode":0,
  89. "md5":"md5",
  90. "result":true
  91. }