Production Deployment Reference Guide

1 Introduction

  1. Linkis has been running stably on the WeBank big data production platform for more than two years. The development and operation personnel have summarized a set of Linkis production deployment guidelines to Let Linkis exert its maximum performance on the basis of stable operation, while also saving server resources and reducing usage costs. The guide includes two major categories: deployment method selection and parameter configuration. Finally, Linkis has also been tested in the test environment for a long time. We will give our stress test practice and experience in Chapter 4.

2 Deployment plan selection

  1. Linkis's stand-alone deployment is simple, but it cannot be used in a production environment, because too many processes on the same server will make the server too stressful.
  2. The choice of deployment plan is related to the company’s user scale, user habits, and the number of concurrent users in the cluster. Generally speaking, we will use Linkis At the same time, the number of users and the user's preference for the execution engine are used to make the choice based on the deployment method.
  3. The following is a detailed description of the number of simultaneous users. Assuming that users prefer spark the most, hive is the second, and it is recommended that the server host memory is 64G or more.
  4. **On the machine where EngineManager is installed, because the user's engine process will be started, the machine's memory load will be relatively high, and other types of microservices will affect the machine The load is relatively low.**
  5. **We generally recommend to reserve about 20G on the server where EM is installed for use by the Linux system, EM's own process and other processes, such as 128G memory For the server, after removing the 20G memory, there is still 100G of memory that can be used to start the engine process. For example, if a Spark Driver has 4G memory, then the server can start up to 25 spark engines.**

The formula for calculating the total resources used: Total resources used by Linkis = total memory + total number of cores =

Number of people online at the same time * (Driver or Hive client memory) + number of people online at the same time * (Driver or Hive client cores)

For example, if there are 50 people using at the same time, Spark’s Driver memory is 2G, Hive Client memory is 2G, and each engine uses two cores, then it is 50 * 2G + 50 * 2 cores = 100G memory + 100 CPU cores

Convention before parameter configuration (must see):

1. The parameters are generally configured in linkis.properties of the conf directory in the microservice installation directory, and configured in the form of key=value, such as wds.linkis.enginemanager.cores.max=20. The only exception is that the configuration of engine microservices needs to be configured in linkis-engine.properties.

2. After the parameter configuration, the microservice needs to be restarted to take effect. After the engine parameter configuration, after the engine manager of the page is killed, restart the engine to take effect

A reference deployment plan is provided below.

2.1 The number of simultaneous users 10-50

1). The best recommendation for server configuration: 4 servers, named S1, S2, S3, S4

Service NameDeployment SelectionDescription
SparkEngineMangerS1SparkEM requires an exclusive server, because it is assumed that the user most prefers spark (if hive is preferred, it can be modified)
SparkEntranceS2
HiveEngineManagerS3
HiveEntranceS2
PythonEngineManagerS3
PythonEntranceS2
Others (Eureka, gateway, etc.)S4If this machine is under too much pressure, you can add another server to deploy services separately

2). Minimum server configuration: 2 servers

3). Parameter configuration

If you need to do this, you need to configure it in linkis.properties and linkis-engine.properties in the conf directory under the microservice installation directory. Parameter configuration is generally divided into two parameter types, Entrance and EngineManager.

a) Entrance microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice2000
wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size100

b) EngineManager microservice

Note: Linkis defines the concept of protecting resources. The purpose of protecting resources is to reserve a certain amount of resources. EM will not use up the maximum resources and activate the role of protecting the machine.

Parameter nameParameter functionSuggested parameter value
wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

2.2 Number of concurrent users 50-100

1). Recommended server configuration: 7 servers, named S1, S2…S7

Service NameDeployment SelectionDescription
SparkEngineMangerS1, S2
SparkEntranceS5
HiveEngineManagerS3, S4
HiveEntranceS5
PythonEngineManagerS4
PythonEntranceS4
Eureka, Gateway, RMS6Eureka and RM require high availability deployment
PublicService, RM, Datasource, EurekaS7Eureka and RM require high availability deployment

2). Minimum server configuration: 4 servers

3). Parameter configuration

a) Entrance microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice3000
wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size120

b) EngineManager microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

2.3 Number of simultaneous users 100-300

1). Recommended server configuration: 11 servers, named S1, S2…S11

Service NameDeployment SelectionDescription
SparkEngineMangerS1, S2, S3, S4
SparkEntranceS8
HiveEngineManagerS5, S6, S7
HiveEntranceS8
PythonEngineManagerS9
PythonEntranceS9
Eureka, Gateway, RMS10Eureka and RM require high availability deployment
PublicService, RM, Datasource, Eurekas11Eureka and RM require high availability deployment

2). Minimum server configuration: 6 servers

3). Parameter configuration

a) Entrance microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice4000
wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size150

b) EngineManager microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

2.4 Number of concurrent users 300-500

1). Recommended server configuration 15 servers, named S1, S2, S3, S4

Service NameDeployment SelectionDescription
SparkEngineMangerS1, S2, S3, S4, S5, S6, S7
SparkEntranceS12
HiveEngineManagerS8, S9, S10, S11
HiveEntranceS12
PythonEngineManagerS13
PythonEntranceS13
Eureka, Gateway, RMS14Eureka and RM require high availability deployment
PublicService, RM, Datasource, Eurekas15Eureka and RM require high availability deployment

2). Minimum server configuration: 10 servers

3). Parameter configuration

a) Entrance microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice5000
wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size150

b) EngineManager microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

2.5 The number of simultaneous users is more than 500

1). Recommended server configuration: 25 servers, named S1, S2.. S19, S25

Service NameDeployment SelectionDescription
SparkEngineMangerS1, S2, S3, S4, S5, S6, S7
S8, S9, S10
SparkEntranceS17
HiveEngineManagerS11,S12,S13,S14,S15,
S16
HiveEntranceS17
PythonEngineManagerS18, S19
PythonEntranceS20
Eureka, RMS21Eureka and RM require high availability deployment
RM, ,EurekaS22Eureka and RM require high availability deployment
Eureka, PublicServiceS23Eureka and RM require high availability deployment
Gateway, DatasourceS24

2). Minimum server configuration: 15 servers

3). Parameter configuration

a) Entrance microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.rpc.receiver.asyn.queue.size.maxSpecify the queue size of RPC messages received by the entrance microservice5000
wds.linkis.rpc.receiver.asyn.consumer.thread.maxSpecify Entrance microservice RPC consumption thread pool size200

b) EngineManager microservice

Parameter nameParameter functionSuggested parameter value
wds.linkis.enginemanager.memory.maxUsed to specify the total memory of all engines started by the EM process40G (64) or 100G (128)
wds.linkis.enginemanager.cores.maxUsed to specify the total number of cores of all engines started by the EM process20
wds.linkis.enginemanager.engine.instances.maxUsed to specify the total number of all engines started by the EM process20
wds.linkis.enginemanager.protected.memoryUsed to specify the memory used by the EM process for protection2G (meaning that up to 38 (40-2) G of memory can be used)
wds.linkis.enginemanager.protected.cores.maxUsed to specify the number of cores used for protection by the EM process2 (meaning that up to 18 (20-2) cores can be used)
wds.linkis.enginemanager.protected.engine.instancesUsed to specify the number of engines used for protection by the EM process1 (meaning that up to 19 (20-1) engines can be started)

3 Other general parameter configuration

In addition to the two types of microservices, Entrance and EngineManager, Linkis has other microservices that also have their own parameters for configuration.

3.1 PublicService custom configuration

The publicService microservice carries various auxiliary functions run by Linkis, including file editing and saving, and result set reading.

Parameter nameParameter functionSuggested parameter value
wds.linkis.workspace.filesystem.get.timeoutUsed to specify the timeout time for obtaining the file system10000 (unit is ms)
wds.linkis.workspace.resultset.download.maxsizeUsed to specify the maximum number of rows of the download result set5000 (up to 5000 downloads) or -1 (full download)

3.2 Engine Microservice

Engine microservices are available at any time, including spark, hive and python engines. The configuration parameters of engine microservices need to be modified in linkis-engine.properties under conf in the EngineManager installation directory.

Parameter nameParameter functionSuggested parameter value
wds.linkis.engine.max.free.timeUsed to specify how long an engine will be killed if idle3h (meaning that an engine will be automatically killed after three hours of not performing a task)

4 Summary

The deployment plan of Linkis is closely related to how it is used. At the same time, the number of users is the biggest influencing factor. In order to enable users to use it comfortably and reduce the cost of cluster servers, it is necessary for operation and maintenance developers to try and listen to user feedback. If it has been deployed The plan is inappropriate, and the deployment plan needs to be changed in a timely and appropriate manner.