Use SQLFlow to Solve Optimization Problems

Open In PAI-DSW

This document explains how to use the SQLFlow to solve the optimization problems.

The Optimization SQL Syntax

The optimization SQL syntax in SQLFlow is as follows:

  1. SELECT ... FROM ...
  2. TO MAXIMIZE|MINIMIZE
  3. objective_expression
  4. CONSTRAINT
  5. constraint_rule_1 [GROUP BY column_1],
  6. constraint_rule_2 [GROUP BY column_2],
  7. ...
  8. constraint_rule_n [GROUP BY column_name_n]
  9. WITH
  10. variables = "result_value_name(column_1,column_2,...,column_m)",
  11. var_type = "Binary"|"Integers"|"Reals"|...
  12. [USING glpk]
  13. INTO output_database.output_table;

where:

  • SELECT ... FROM ...: any standard SQL query statement.
  • TO MAXIMIZE|MINIMIZE: whether to maximize or minimize the value of the objective_expression.
  • objective_expression : the objective expression to be maximized or minimized.
  • CONSTRAINT: indicates the constraint rules. We should separate each constraint rule using the comma. There may be GROUP BY clauses at the end of some constraint rules. For example, GROUP BY column_1 means that we would apply the constraint rule to each unique cell value of the column column_1.
  • variables: indicates the variable names and value to be optimized, where result_value_name means the variable value to be optimized, and the string values in the brackets, i.e. column_1,column_2,...,column_m are the column names of the variables to be optimized.
  • var_type: the domain of the variable value. SQLFlow supports the following var_type:
    • Binary: the variable value can be only 0 or 1.
    • Integers: the variable value can be only integers.
    • PositiveIntegers, NegativeIntegers: the variable value can be only positive or negative integers.
    • NonPositiveIntegers, NonNegativeIntegers: the variable value can be only non-positive or non-negative integers.
    • Reals: the variable value can be only real numbers.
    • PositiveReals, NegativeReals: the variable value can be positive or negative real numbers.
    • NonPositiveReals, NonNegativeReals: the variable value can be non-positive or non-negative real numbers.
  • USING glpk: indicates the solver to solve the problem. Currently, only glpk is supported. Please see here for details on the GLPK solver. The USING clause will be optional if we use the glpk solver.
  • INTO ...: indicates the output table to save the solved result.

Example 1: Single Column Case

Let us take an example to explain how to use SQLFlow to solve optimization problems. You can refer to this case for details here.

Giapetto’s Woodcarving, Inc., manufactures two types of wooden toys: soldiers and trains. A soldier sells for 27 dollar and uses 10 dollar worth of raw materials. Each soldier that is manufactured increases Giapetto’s variable labor and overhead costs by 14 dollar. A train sells for 21 dollar and uses 9 dollar worth of raw materials. Each train built increases Giapetto’s variable labor and overhead costs by 10 dollar. The manufacture of wooden soldiers and trains requires two types of skilled labor: carpentry and finishing. A soldier requires 2 hours of finishing labor and 1 hour of carpentry labor. A train requires 1 hour of finishing labor and 1 hour of carpentry labor. Each week, Giapetto can obtain all the needed raw material but only 100 finishing hours and 80 carpentry hours. At most 10000 trains and at most 40 soldiers are bought each week. Giapetto wants to maximize weekly profit (revenues-costs).

Let

  • x be the number of soldiers produced each week
  • y be the number of trains produced each week

Then the objective is:

Maximize Z = (27 - 10 - 14)x + (21 - 9 - 10)y

The constraints are:

  • 2x + 1y <= 100 (finishing constraint)
  • 1x + 1y <= 80 (carpentry constraint)
  • x <= 40, y <= 10000 (demand constraint)
  • both x,y are non-negative integers.

The table my_db.woodcarving corresponding to the example above is:

productpricematerials_costother_costfinishingcarpentrymax_num
soldier2710142140
train219101110000

We can create the data table my_db.woodcarving using the following SQL statements:

  1. %%sqlflow
  2. CREATE DATABASE IF NOT EXISTS my_db;
  3. DROP TABLE IF EXISTS my_db.woodcarving;
  4. CREATE TABLE my_db.woodcarving (
  5. product VARCHAR(255),
  6. price FLOAT,
  7. materials_cost FLOAT,
  8. other_cost FLOAT,
  9. finishing FLOAT,
  10. carpentry FLOAT,
  11. max_num FLOAT
  12. );
  13. INSERT INTO my_db.woodcarving VALUES('soldier', 27, 10, 14, 2, 1, 40);
  14. INSERT INTO my_db.woodcarving VALUES('train', 21, 9, 10, 1, 1, 10000);

The SQLFlow optimization SQL statement for this case would be:

  1. %%sqlflow
  2. SELECT * FROM my_db.woodcarving -- the input data source
  3. TO MAXIMIZE
  4. SUM((price - materials_cost - other_cost) * amount) -- the objective expression
  5. CONSTRAINT
  6. SUM(finishing * amount) <= 100, -- finishing constraint, i.e, 2*x + 1*y <= 100
  7. SUM(carpentry * amount) <= 80, -- carpentry constraint, i.e., 1*x + 1*y <= 80
  8. amount <= max_num -- demand constraint, i.e., x <= 40, y <= 10000
  9. WITH
  10. variables="amount(product)", -- amount = (x, y) is the value to be optimized, product is the column name of the variable
  11. var_type="NonNegativeIntegers" -- amount = (x, y) is inside the domain of non-negative integers
  12. USING glpk -- use the GLPK solver to solve the linear optimization problem
  13. INTO my_db.woodcarving_result_table;

Once the SQLFlow server receives the SQL statement above, it would call the GLPK solver to solve the optimization problem described in the SQL statement. After solving the problem, we would get the following logs:

  1. Solved result is:
  2. product amount
  3. 0 soldier 20
  4. 1 train 60
  5. Saved in my_db.woodcarving_result_table.
  6. Objective value is 180.0

We can also examine the solved result by the SQL statement:

  1. %%sqlflow
  2. SELECT * FROM my_db.woodcarving_result_table;
productamount
soldier20
train60

Example 2: Multiple Columns Case with GROUP BY

Suppose that there are several plants that manufacture products, and several markets that sell them (see the example described here for details). We want to minimize the cost of transportation between plants and markets.

We have three tables that look like below:

  1. Plants capacity table my_db.plants, where the column capacity indicates the maximum product number that each plant can manufacture. The product number should be integers.
plantscapacity
plantA100
plantB90
  1. Markets demand table my_db.markets, where the column demand indicates the required product number of each market.
marketsdemand
marketA130
marketB60
  1. Plants to markets distance table my_db.transportation, where the column distance is the distance to transport each plant to each market.
plantsmarketsdistance
plantAmarketA140
plantAmarketB210
plantBmarketA300
plantBmarketB90

We can create the tables above using the following SQL statements:

  1. %%sqlflow
  2. CREATE DATABASE IF NOT EXISTS my_db;
  3. DROP TABLE IF EXISTS my_db.plants;
  4. CREATE TABLE my_db.plants (
  5. plants VARCHAR(255),
  6. capacity FLOAT
  7. );
  8. INSERT INTO my_db.plants VALUES('plantA', 100), ('plantB', 90);
  9. DROP TABLE IF EXISTS my_db.markets;
  10. CREATE TABLE my_db.markets (
  11. markets VARCHAR(255),
  12. demand FLOAT
  13. );
  14. INSERT INTO my_db.markets VALUES('marketA', 130), ('marketB', 60);
  15. DROP TABLE IF EXISTS my_db.transportation;
  16. CREATE TABLE my_db.transportation (
  17. plants VARCHAR(255),
  18. markets VARCHAR(255),
  19. distance FLOAT
  20. );
  21. INSERT INTO my_db.transportation VALUES('plantA', 'marketA', 140);
  22. INSERT INTO my_db.transportation VALUES('plantA', 'marketB', 210);
  23. INSERT INTO my_db.transportation VALUES('plantB', 'marketA', 300);
  24. INSERT INTO my_db.transportation VALUES('plantB', 'marketB', 90);

When we start to solve the problem, we would like to join the tables beforehand:

  1. %%sqlflow
  2. SELECT
  3. t.plants AS plants,
  4. t.markets AS markets,
  5. t.distance AS distance,
  6. p.capacity AS capacity,
  7. m.demand AS demand
  8. FROM my_db.transportation AS t
  9. LEFT JOIN my_db.plants AS p ON t.plants = p.plants
  10. LEFT JOIN my_db.markets AS m ON t.markets = m.markets;

Then we have a “joined” table like below to start the solving process:

plantsmarketsdistancecapacitydemand
plantAmarketA140100130
plantBmarketA30090130
plantAmarketB21010060
plantBmarketB909060

Then we can use below extended SQL syntax to describe the above example:

  1. %%sqlflow
  2. SELECT
  3. t.plants AS plants,
  4. t.markets AS markets,
  5. t.distance AS distance,
  6. p.capacity AS capacity,
  7. m.demand AS demand
  8. FROM my_db.transportation AS t
  9. LEFT JOIN my_db.plants AS p ON t.plants = p.plants
  10. LEFT JOIN my_db.markets AS m ON t.markets = m.markets -- the joined SQL statement
  11. TO MINIMIZE
  12. SUM(amount * distance) -- the objective expression, minimize the total transportation distance
  13. CONSTRAINT
  14. SUM(amount) <= capacity GROUP BY plants,
  15. SUM(amount) >= demand GROUP BY markets
  16. WITH
  17. -- amount is the value to be optimized, "plants" and "markets" are the column names of the variables
  18. variables="amount(plants,markets)",
  19. var_type="NonNegativeIntegers" -- the amount value should be non-negative integers
  20. USING glpk
  21. INTO my_db.transportation_result_table;

where there are GROUP BY clauses in the constraint rules, which mean:

  • SUM(amount) <= capacity GROUP BY plant : for each plant, the sum of the amount value should not exceed the capacity of the plant.
  • SUM(amount) >= demand GROUP BY markets : for each market, the sum of the amount value should be larger than or equal to the demand of the market.

After solving the problem, we would get the following logs:

  1. Solved result is:
  2. plants markets amount
  3. 0 plantA marketA 100
  4. 1 plantB marketA 30
  5. 2 plantA marketB 0
  6. 3 plantB marketB 60
  7. Saved in my_db.transportation_result_table.
  8. Objective value is 28400.0

We can also examine the solved result by the SQL statement:

  1. %%sqlflow
  2. SELECT * FROM my_db.transportation_result_table;
plantsmarketsamount
plantAmarketA100
plantBmarketA30
plantAmarketB0
plantBmarketB60

Summary

In the above examples, we explain how to use the SQLFlow to solve the optimization problems. Currently, we only support the linear optimization problem and the GLPK solver. We would support more optimization problems and solvers in the future version.