提示

  1. 除了最极端的情况,对于绝大多数的用例来说,单机群安装的Pulsar就能够满足要求了。 如果是创业公司或一个团队想体验下Pulsar,我们推荐使用单集群。 如果你确实需要运行一个多集群的Pulsar实例,请参考这里的指南。

  2. 要在Pulsar部署中使用所有内置的 Pulsar IO 连接器,下载 apache-pulsar-io-connectors 安装包,并确保将其安装到每个 broker 节点的 pulsar 目录下的 connectors 目录中;如果 Pulsar Functions 运行在独立的 function worker 集群中,则将其安装到每个 function-worker节点下的pulsar 文件目录中。

  3. 如果您在Pulsar的部署中想使用分层存储功能,您需要下载apache-pulsar-offloaders安装包,并确保把安装包解压到所有 broker 的 pulsar 文件目录下的 offloaders 文件目录中 。 有关如何配置此功能的更多详细信息,可以参考分层存储手册

部署Pulsar集群包括以下步骤(按顺序):

准备工作

要求

如果您想使用已有的zookeeper集群,那就不用准备额外的机器去运行zookeeper。

要在裸机上运行Pulsar,建议准备以下资源:

  • 至少 6 台 Linux 机器或虚拟机
  • 包含 Pulsar 所有 broker 主机的 DNS(域名解析服务器)

如果没有足够的机器,或者想在集群模式下使用 Pulsar (稍后扩展集群),可以把 Pulsar 部署在一个节点上,该节点会在同一台机器上运行 zookeeper、bookie 和 broker。

集群中的每台机器都需要安装Java 8或更高版本。

下图展示了基本设置:

alt-text

该图中,Connecting clients 使用单个URL(在本例中是pulsar-cluster.acme.com)与 Pulsar 集群进行通信,该 URL 是 broker 处理消息的一种抽象概念。 Pulsar消息brokers与BookKeeper bookies一起运行;反过来,brokers 和 bookies 都依赖 ZooKeeper。

硬件条件

在部署 Pulsar 集群时,我们提供了一些基本建议,在容量规划时应牢记这些建议。

ZooKeeper

对于运行ZooKeeper的机器,我们建议使用配置一般的机器或VM。 Pulsar使用ZooKeeper只是定期做些协调和配置相关的任务,用于基本操作。 如果您在Amazon Web Services (AWS)上部署Pulsar,例如一个t2.small 实例就足够了。

Bookies & Brokers

对于运行 bookie 和 Pulsar broker 的机器,我们建议使用性能更强大的机器。 例如,对于AWS部署,i3.4xlarge实例可能是合适的。 在这些机器上部署,我们还有以下建议:

  • 高性能的CPU和10Gbps NIC (适用于Pulsar brokers)
  • 小型快速固态硬盘(SSD)或硬盘驱动器(HDD),带有RAID控制器和电池供电的写缓存(适用于BookKeeper bookies)

安装Pulsar二进制包

You’ll need to install the Pulsar binary package on each machine in the cluster, including machines running ZooKeeper and BookKeeper.

要在裸机上部署Pulsar集群,用以下任意方式下载二进制压缩包:

  1. $ wget https://archive.apache.org/dist/pulsar/pulsar-2.6.1/apache-pulsar-2.6.1-bin.tar.gz

下载压缩包后,将其解压,使用cd命令,进入解压后的文件目录中:

  1. $ tar xvzf apache-pulsar-2.6.1-bin.tar.gz
  2. $ cd apache-pulsar-2.6.1

解压后的文件目录包含以下子目录:

目录内容
binPulsar的命令行工具,如 pulsarpulsar-admin
confPulsar的配置文件,包含broker配置,ZooKeeper 配置 等等
dataZooKeeper和BookKeeper使用的数据存储目录
libPulsar使用到的 JAR 文件
logs安装时生成的日志文件

安装内置连接器(可选)

Since release 2.1.0-incubating, Pulsar releases a separate binary distribution, containing all the builtin connectors. If you would like to enable those builtin connectors, you can follow the instructions as below; otherwise you can skip this section for now.

要使用内置连接器,您需要在每个broker节点上下载连接器的发行版压缩包,通过以下方式之一来进行下载:

  1. $ wget https://archive.apache.org/dist/pulsar/pulsar-2.6.1/connectors/{connector}-2.6.1.nar

Once the nar file is downloaded, copy the file to directory connectors in the pulsar directory, for example, if the connector file pulsar-io-aerospike-2.6.1.nar is downloaded:

  1. $ mkdir connectors
  2. $ mv pulsar-io-aerospike-2.6.1.nar connectors
  3. $ ls connectors
  4. pulsar-io-aerospike-2.6.1.nar
  5. ...

安装分层存储卸载程序(可选)

2.2.0版本,Pulsar发布了一个单独的二进制发行版,其中包含分层存储卸载器。 如果您想启用分层存储功能,可以按照以下说明操作; 否则你可以暂时跳过此部分。

要使用分层存储卸载器,您需要在每个broker节点上下载其发行版压缩包,可通过以下任一方式来下载:

  1. $ wget https://archive.apache.org/dist/pulsar/pulsar-2.6.1/apache-pulsar-offloaders-2.6.1-bin.tar.gz

Once the tarball is downloaded, in the pulsar directory, untar the offloaders package and copy the offloaders as offloaders in the pulsar directory:

  1. $ tar xvfz apache-pulsar-offloaders-2.6.1-bin.tar.gz
  2. // you will find a directory named `apache-pulsar-offloaders-2.6.1` in the pulsar directory
  3. // then copy the offloaders
  4. $ mv apache-pulsar-offloaders-2.6.1/offloaders offloaders
  5. $ ls offloaders
  6. tiered-storage-jcloud-2.6.1.nar

有关如何配置分层存储功能的更多详细信息,参考分层存储手册

Deploying a ZooKeeper cluster

If you already have an exsiting zookeeper cluster and would like to use it, you can skip this section.

ZooKeeper manages a variety of essential coordination- and configuration-related tasks for Pulsar. To deploy a Pulsar cluster you’ll need to deploy ZooKeeper first (before all other components). We recommend deploying a 3-node ZooKeeper cluster. Pulsar does not make heavy use of ZooKeeper, so more lightweight machines or VMs should suffice for running ZooKeeper.

To begin, add all ZooKeeper servers to the configuration specified in conf/zookeeper.conf (in the Pulsar directory you created above). Here’s an example:

  1. server.1=zk1.us-west.example.com:2888:3888
  2. server.2=zk2.us-west.example.com:2888:3888
  3. server.3=zk3.us-west.example.com:2888:3888

如果您只有一台机器来部署Pulsar,只需在配置文件中添加一个服务器条目即可。

On each host, you need to specify the ID of the node in each node’s myid file, which is in each server’s data/zookeeper folder by default (this can be changed via the dataDir parameter).

See the Multi-server setup guide in the ZooKeeper documentation for detailed info on myid and more.

On a ZooKeeper server at zk1.us-west.example.com, for example, you could set the myid value like this:

  1. $ mkdir -p data/zookeeper
  2. $ echo 1 > data/zookeeper/myid

On zk2.us-west.example.com the command would be echo 2 > data/zookeeper/myid and so on.

Once each server has been added to the zookeeper.conf configuration and has the appropriate myid entry, you can start ZooKeeper on all hosts (in the background, using nohup) with the pulsar-daemon CLI tool:

  1. $ bin/pulsar-daemon start zookeeper

如果您计划将zookeeper和bookie部署在同一个节点上,需要使用不同端口配置来启动zookeeper。

使用pulsar-daemon CLI命令行工具启动zookeeper,如:

  1. $ PULSAR_EXTRA_OPTS="-Dstats_server_port=8001" bin/pulsar-daemon start zookeeper

Initializing cluster metadata

Once you’ve deployed ZooKeeper for your cluster, there is some metadata that needs to be written to ZooKeeper for each cluster in your instance. It only needs to be written once.

You can initialize this metadata using the initialize-cluster-metadata command of the pulsar CLI tool. This command can be run on any machine in your ZooKeeper cluster. Here’s an example:

  1. $ bin/pulsar initialize-cluster-metadata \
  2. --cluster pulsar-cluster-1 \
  3. --zookeeper zk1.us-west.example.com:2181 \
  4. --configuration-store zk1.us-west.example.com:2181 \
  5. --web-service-url http://pulsar.us-west.example.com:8080 \
  6. --web-service-url-tls https://pulsar.us-west.example.com:8443 \
  7. --broker-service-url pulsar://pulsar.us-west.example.com:6650 \
  8. --broker-service-url-tls pulsar+ssl://pulsar.us-west.example.com:6651

As you can see from the example above, the following needs to be specified:

标记Description
—cluster集群名字
—zookeeperA “local” ZooKeeper connection string for the cluster. This connection string only needs to include one machine in the ZooKeeper cluster.
—configuration-store整个集群实例的配置存储连接字符串。 和—zookeeper标记一样,该连接字符串只需包含ZooKeeper集群中的任一台机器即可。
—web-service-urlThe web service URL for the cluster, plus a port. This URL should be a standard DNS name. The default port is 8080 (we don’t recommend using a different port).
—web-service-url-tlsIf you’re using TLS, you’ll also need to specify a TLS web service URL for the cluster. The default port is 8443 (we don’t recommend using a different port).
—broker-service-urlBroker服务的URL,用于与集群中的brokers进行交互。 此URL应使用与Web服务URL相同的DNS名称,但应使用pulsar替代scheme。 默认端口为 6650 (我们不建议使用其他端口)。
—broker-service-url-tls如果使用 TLS ,那么您还需要为群集指定TLS Web服务URL,为群集中的各个 broker 指定 TLS broker服务URL。 默认端口为 6651 (我们不建议使用其他端口)。

Deploying a BookKeeper cluster

BookKeeper handles all persistent data storage in Pulsar. You will need to deploy a cluster of BookKeeper bookies to use Pulsar. We recommend running a 3-bookie BookKeeper cluster.

BookKeeper bookies can be configured using the conf/bookkeeper.conf configuration file. The most important step in configuring bookies for our purposes here is ensuring that the zkServers is set to the connection string for the ZooKeeper cluster. Here’s an example:

  1. zkServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181

Once you’ve appropriately modified the zkServers parameter, you can provide any other configuration modifications you need. You can find a full listing of the available BookKeeper configuration parameters here, although we would recommend consulting the BookKeeper documentation for a more in-depth guide.

Once you’ve applied the desired configuration in conf/bookkeeper.conf, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using nohup, or in the foreground.

To start the bookie in the background, use the pulsar-daemon CLI tool:

  1. $ bin/pulsar-daemon start bookie

To start the bookie in the foreground:

  1. $ bin/bookkeeper bookie

You can verify that a bookie is working properly by running the bookiesanity command for the BookKeeper shell on it:

  1. $ bin/bookkeeper shell bookiesanity

This will create an ephemeral BookKeeper ledger on the local bookie, write a few entries, read them back, and finally delete the ledger.

After you have started all the bookies, you can use simpletest command for BookKeeper shell on any bookie node, to verify all the bookies in the cluster are up running.

  1. $ bin/bookkeeper shell simpletest --ensemble <num-bookies> --writeQuorum <num-bookies> --ackQuorum <num-bookies> --numEntries <num-entries>

This command will create a num-bookies sized ledger on the cluster, write a few entries, and finally delete the ledger.

Deploying Pulsar brokers

Pulsar brokers are the last thing you need to deploy in your Pulsar cluster. Brokers handle Pulsar messages and provide Pulsar’s administrative interface. We recommend running 3 brokers, one for each machine that’s already running a BookKeeper bookie.

配置Broker

Broker配置中有一些非常重要的参数,这些参数的正确配置可以确保每个Broker都知道已部署的ZooKeeper集群。 Make sure that the zookeeperServers and configurationStoreServers parameters. In this case, since we only have 1 cluster and no configuration store setup, the configurationStoreServers will point to the same zookeeperServers.

  1. zookeeperServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181
  2. configurationStoreServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181

还需要指定群集名称(与初始化群集元数据时提供的名称相匹配):

  1. clusterName=pulsar-cluster-1

此外,需要匹配初始化集群元数据时提供的Broker和Web服务端口(特别是使用与默认端口不同的端口时):

  1. brokerServicePort=6650
  2. brokerServicePortTls=6651
  3. webServicePort=8080
  4. webServicePortTls=8443

如果在单节点集群中部署Pulsar,您需要把conf/broker.conf文件中的复制设置更新为1

Number of bookies to use when creating a ledger

managedLedgerDefaultEnsembleSize=1

Number of copies to store for each message

managedLedgerDefaultWriteQuorum=1

Number of guaranteed copies (acks to wait before write is complete)

managedLedgerDefaultAckQuorum=1

  1. ### 启用 Pulsar Functions(可选)
  2. If you want to enable [Pulsar Functions](functions-overview.md), you can follow the instructions as below:
  3. 1. Edit `conf/broker.conf` to enable functions worker, by setting `functionsWorkerEnabled` to `true`.
  4. ```conf
  5. functionsWorkerEnabled=true
  6. ```
  7. 2. 编辑`conf/functions_worker.yml`并将`pulsarFunctionsCluster`设置为[初始化集群元数据](#initializing-cluster-metadata)时提供的集群名称。
  8. ```conf
  9. pulsarFunctionsCluster: pulsar-cluster-1
  10. ```
  11. If you would like to learn more options about deploying functions worker, please checkout [Deploy and manage functions worker](functions-worker.md).
  12. ### 启动 Broker
  13. You can then provide any other configuration changes that you'd like in the [`conf/broker.conf`](reference-configuration.md#broker) file. Once you've decided on a configuration, you can start up the brokers for your Pulsar cluster. Like ZooKeeper and BookKeeper, brokers can be started either in the foreground or in the background, using nohup.
  14. You can start a broker in the foreground using the [`pulsar broker`](reference-cli-tools.md#pulsar-broker) command:
  15. ```bash
  16. $ bin/pulsar broker

You can start a broker in the background using the pulsar-daemon CLI tool:

  1. $ bin/pulsar-daemon start broker

Once you’ve succesfully started up all the brokers you intend to use, your Pulsar cluster should be ready to go!

Connecting to the running cluster

Once your Pulsar cluster is up and running, you should be able to connect with it using Pulsar clients. One such client is the pulsar-client tool, which is included with the Pulsar binary package. The pulsar-client tool can publish messages to and consume messages from Pulsar topics and thus provides a simple way to make sure that your cluster is runnning properly.

To use the pulsar-client tool, first modify the client configuration file in conf/client.conf in your binary package. You’ll need to change the values for webServiceUrl and brokerServiceUrl, substituting localhost (which is the default), with the DNS name that you’ve assigned to your broker/bookie hosts. Here’s an example:

  1. webServiceUrl=http://us-west.example.com:8080/
  2. brokerServiceurl=pulsar://us-west.example.com:6650/

Once you’ve done that, you can publish a message to Pulsar topic:

  1. $ bin/pulsar-client produce \
  2. persistent://public/default/test \
  3. -n 1 \
  4. -m "Hello Pulsar"

You may need to use a different cluster name in the topic if you specified a cluster name different from pulsar-cluster-1.

This will publish a single message to the Pulsar topic. In addition, you can subscribe the Pulsar topic in a different terminal before publishing messages as below:

  1. $ bin/pulsar-client consume \
  2. persistent://public/default/test \
  3. -n 100 \
  4. -s "consumer-test" \
  5. -t "Exclusive"

上面的消息成功发布到主题后,您会在标准输出中看到:

  1. ----- 收到消息 -----
  2. Hello Pulsar

Running Functions

If you have enabled Pulsar Functions, you can also tryout pulsar functions now.

Create a ExclamationFunction exclamation.

  1. bin/pulsar-admin functions create \
  2. --jar examples/api-examples.jar \
  3. --classname org.apache.pulsar.functions.api.examples.ExclamationFunction \
  4. --inputs persistent://public/default/exclamation-input \
  5. --output persistent://public/default/exclamation-output \
  6. --tenant public \
  7. --namespace default \
  8. --name exclamation

Check if the function is running as expected by triggering the function.

  1. bin/pulsar-admin functions trigger --name exclamation --trigger-value "hello world"

You will see output as below:

  1. hello world!