Custom Variable Design

1. General

Requirements Background

  1. Users want to be able to define some common variables when writing code and then replace them during execution. For example, users run the same sql in batches every day, and need to specify the partition time of the previous day. If based on sql It will be more complicated to write if the system provides a variable of run\_date which will be very convenient to use.

Target

  1. Support variable substitution of task code
  2. Support custom variables, support users to define custom variables in scripts and task parameters submitted to Linkis, support simple +, - and other calculations
  3. Preset system variables: run_date, run_month, run_today and other system variables

2. Overall Design

  1. During the execution of the Linkis task, the custom variables are carried out in Entrance, mainly through the interceptor of Entrance before the task is submitted and executed. The variable and the defined variable, and complete the code replacement through the initial value of the custom variable passed in by the task, and become the final executable code.

2.1 Technical Architecture

  1. The overall structure of custom variables is as follows. After the task is submitted, it will go through the variable replacement interceptor. First, all variables and expressions used in the code will be parsed, and then replaced with the system and user-defined initial values ​​of variables, and finally the parsed code will be submitted to EngineConn for execution. So the underlying engine is already replaced code.

var_arc

3 Function introduction

  1. The variable types supported by Linkis are divided into custom variables and system built-in variables. The internal variables are predefined by Linkis and can be used directly. Then different variable types support different calculation formats: String supports +, integer decimal supports +-\*/, date supports +-.

3.1 Built-in variables

The currently supported built-in variables are as follows:

variable namevariable typevariable meaningvariable value example
run_dateStringData statistics time (support user’s own setting, the default setting is the day before the current time), if the data of yesterday is executed today, it will be the time of yesterday, the format is yyyyMMdd20180129
run_date_stdStringData statistics time (standard date format), if yesterday’s data is executed today, it will be yesterday’s time, the format is yyyy-MM-dd2018-01-29
run_todayStringThe day after run_date (data statistics time), the format is yyyyMMdd20211210
run_today_stdStringThe day after run_date (data statistics time) (standard format), the format is yyyy-MM-dd2021-12-10
run_monStringThe month of the data statistics time, the format is yyyyMM202112
run_mon_stdStringThe month of the data statistics time (standard format), the format is yyyy-MM2021-12
run_month_beginStringThe first day of the month in which the data is counted, in the format yyyyMMdd20180101
run_month_begin_stdStringThe first day of the month where the data statistics time is (standard date format), the format is yyyy-MM-dd2018-01-01
run_month_now_beginStringThe first day of the month where run_today is in the format yyyyMMdd20211201
run_month_now_begin_stdStringThe first day of the month run_today (standard format), the format is yyyy-MM-dd2021-12-01
run_month_endStringThe last day of the month in which the data is counted, in the format yyyyMMdd20180131
run_month_end_stdStringThe last day of the month in which the data is counted (standard date format), the format is yyyy-MM-dd2018-01-31
run_month_now_endStringThe last day of the month where run_today is in the format yyyyMMdd20211231
run_month_now_end_stdStringThe last day of the month in which run_today is located (standard date format), the format is yyyy-MM-dd2021-12-31
run_quarter_beginStringThe first day of the quarter in which the data is counted, in the format yyyyMMdd20210401
run_quarter_endStringThe last day of the quarter in which the data is counted, in the format yyyyMMdd20210630
run_half_year_beginStringThe first day of the half year where the data statistics time is located, in the format yyyyMMdd20210101
run_half_year_endStringThe last day of the half year where the data statistics time is located, the format is yyyyMMdd20210630
run_year_beginStringThe first day of the year in which the data is counted, in the format yyyyMMdd20210101
run_year_endStringThe last day of the year in which the data is counted, in the format yyyyMMdd20211231
run_quarter_begin_stdStringThe first day of the quarter in which the data is counted (standard format), the format is yyyy-MM-dd2021-10-01
run_quarter_end_stdStringThe last day of the quarter where the data statistics time is located (standard format), the format is yyyy-MM-dd2021-12-31
run_half_year_begin_stdStringThe first day of the half year where the data statistics time is located (standard format), the format is yyyy-MM-dd2021-07-01
run_half_year_end_stdStringThe last day of the half year where the data statistics time is located (standard format), the format is yyyy-MM-dd2021-12-31
run_year_begin_stdStringThe first day of the year in which the data is counted (standard format), the format is yyyy-MM-dd2021-01-01
run_year_end_stdStringThe last day of the year in which the data is counted (standard format), the format is yyyy-MM-dd2021-12-31

details:

  1. run_date is the core built-in date variable, which supports user-defined date. If not specified, the default is the day before the current system time.
  2. Definition of other derived built-in date variables: other date built-in variables are calculated relative to run_date. Once run_date changes, other variable values ​​will also change automatically. Other date variables do not support setting initial values ​​and can only be modified by modifying run_date. .
  3. Built-in variables support more abundant usage scenarios: ${run_date-1} is the day before run_data; ${run_month_begin-1} is the first day of the previous month of run_month_begin, where -1 means minus one month.

3.2 Custom variables

  1. What are custom variables? User variables that are defined first and then used. User-defined variables temporarily support the definition of strings, integers, and floating-point variables. Strings support the + method, and integers and floating-point numbers support the +-\*/ method. User-defined variables do not conflict with the set variable syntax supported by SparkSQL and HQL, but the same name is not allowed. How to define and use custom variables? as follows:
  1. ## Defined in the code, specified before the task code
  2. sql type definition method:
  3. --@set f=20.1
  4. The python/shell types are defined as follows:
  5. #@set f=20.1
  6. Note: Only one variable can be defined on one line

The use is directly used in the code through {varName expression}, such as ${f*2}

3.3 Variable scope

Custom variables in linkis also have scope, and the priority is that the variable defined in the script is greater than the Variable defined in the task parameter is greater than the built-in run_date variable. The task parameters are defined as follows:

  1. ##restful
  2. {
  3. "executionContent": {"code": "select \"${f-1}\";", "runType": "sql"},
  4. "params": {
  5. "variable": {f: "20.1"},
  6. "configuration": {
  7. "runtime": {
  8. "linkis.openlookeng.url":"http://127.0.0.1:9090"
  9. }
  10. }
  11. },
  12. "source": {"scriptPath": "file:///mnt/bdp/hadoop/1.sql"},
  13. "labels": {
  14. "engineType": "spark-2.4.3",
  15. "userCreator": "hadoop-IDE"
  16. }
  17. }
  18. ## java SDK
  19. JobSubmitAction.builder
  20. .addExecuteCode(code)
  21. .setStartupParams(startupMap)
  22. .setUser(user) //submit user
  23. .addExecuteUser(user) //execute user
  24. .setLabels(labels)
  25. .setVariableMap(varMap) //setVar
  26. .build