STATIC GRAPH INTERFACE: NN.GRAPH

At present, there are two ways to run models in deep learning framework, Dynamic Graph and Static Graph, which are also called Eager Mode and Graph Mode in OneFlow.

There are pros and cons to both approaches, and OneFlow offers support for both, with the Eager Mode by default. If you are reading the tutorials for this basic topic in order, then all the code you have encountered so far is in Eager Mode.

In general, dynamic graphs are easier to use and static graphs have better performance. OneFlow offers nn.Graph, so that users can use the eager-like programming style to build static graphs and train the models.

Eager Mode in OneFlow

OneFlow runs in Eager Mode by default.

The following script, using polynomial Static Graph Interface - 图1 to fit the sine function Static Graph Interface - 图2, finds a set of approximate fitting parameters Static Graph Interface - 图3, Static Graph Interface - 图4, Static Graph Interface - 图5, Static Graph Interface - 图6.

This example was introduced to show how Eager Mode and Graph Mode are related in OneFlow (most of the code is reusable). Readers may be very familiar with OneFlow’s Eager Mode now, here we do not explain in detail, interested readers can click on “Code” to expand the Code.

Note: This sample code is adapted from PyTorch official tutorial.

Code

  1. import math
  2. import numpy as np
  3. import oneflow as flow
  4. device = flow.device("cuda")
  5. dtype = flow.float32
  6. # Create Tensors to hold input and outputs.
  7. x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
  8. y = flow.tensor(np.sin(x), device=device, dtype=dtype)
  9. # For this example, the output y is a linear function of (x, x^2, x^3), so
  10. # we can consider it as a linear layer neural network. Let's prepare the
  11. # tensor (x, x^2, x^3).
  12. xx = flow.cat(
  13. [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
  14. )
  15. # The Linear Module
  16. model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
  17. model.to(device)
  18. # Loss Function
  19. loss_fn = flow.nn.MSELoss(reduction="sum")
  20. loss_fn.to(device)
  21. # Optimizer
  22. optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
  23. for t in range(2000):
  24. # Forward pass: compute predicted y by passing x to the model.
  25. y_pred = model(xx)
  26. # Compute and print loss.
  27. loss = loss_fn(y_pred, y)
  28. if t % 100 == 99:
  29. print(t, loss.numpy())
  30. # Use the optimizer object to zero all of the gradients for the variables
  31. # it will update (which are the learnable weights of the model).
  32. optimizer.zero_grad()
  33. # Backward pass: compute gradient of the loss with respect to model
  34. # parameters.
  35. loss.backward()
  36. # Calling the step function on an Optimizer makes an update to its
  37. # parameters.
  38. optimizer.step()
  39. linear_layer = model[0]
  40. print(
  41. f"Result: y = {linear_layer.bias.numpy()[0]} + {linear_layer.weight[:, 0].numpy()[0]}*x + {linear_layer.weight[:, 1].numpy()[0]}*x^2 + {linear_layer.weight[:, 2].numpy()[0]}*x^3"
  42. )

Out:

  1. 99 582.7045
  2. ...
  3. 1799 9.326502
  4. 1899 9.154123
  5. 1999 9.040091
  6. Result: y = -0.0013652867637574673 + 0.8422811627388*x + 0.0002355352626182139*x^2 + -0.09127362817525864*x^3

Graph Mode in OneFlow

Customize a Graph

OneFlow provide the base class nn.Graph, which can be inherited to create a customized Graph class.

  1. import oneflow as flow
  2. import oneflow.nn as nn
  3. class MyLinear(nn.Graph):
  4. def __init__(self, in_features, out_features):
  5. super().__init__()
  6. self.weight = nn.Parameter(flow.randn(in_features, out_features))
  7. self.bias = nn.Parameter(flow.randn(out_features))
  8. def build(self, input):
  9. return flow.matmul(input, self.weight) + self.bias

The simple example above contains the important steps needed to customize a Graph:

  • Inherits nn.Graph.
  • Call super().__init__() at the begining of __init__ method to get OneFlow to do the necessary initialization for the Graph.
  • Defines the structure and state of a neural network in __init__ method.
  • Describes the computational process in build method.

You can then instantiate and call the Graph:

  1. mygraph = MyLinear(4, 3)
  2. input = flow.randn(1, 4)
  3. out = mygraph(input)
  4. print(out)

Out:

  1. tensor([[ 4.0638, -1.4453, 3.9640]], dtype=oneflow.float32)

Note that Graph is similar to Module in that the object itself is callable and it is not recommended to explicitly call the build method. The definition of a Graph is very similar to the use of a Module, in fact, Graph can directly reuse a defined Module. Users can refer the content in Build Network directly about how to build a neural network in Graph Mode.

For example, use the model above as the network structure:

  1. class ModelGraph(flow.nn.Graph):
  2. def __init__(self):
  3. super().__init__()
  4. self.model = model
  5. def build(self, x, y):
  6. y_pred = self.model(x)
  7. return loss
  8. model_graph = ModelGraph()

The major difference between Module and Graph is that Graph uses build method rather than forward method to describe the computation process, because the build method can contain not only forward computation, but also setting loss, optimizer, etc. You will see an example of using Graph for training later.

Inference in Graph Mode

The following example for inference in Graph Mode directly using the model, which we have already trained in Eager Mode at the beginning of this article.

  1. class LinearPredictGraph(flow.nn.Graph):
  2. def __init__(self):
  3. super().__init__()
  4. self.model = model
  5. def build(self, x):
  6. return self.model(x)
  7. linear_graph = LinearPredictGraph()
  8. y_fit = linear_graph(xx)

Draw the differences between the original function outputs and the fitting results:

  1. import matplotlib.pyplot as plt
  2. plt.plot(x.numpy(),y.numpy())
  3. plt.plot(x.numpy(),y_fit.numpy())

poly_fit

Training in Graph Mode

The Graph can be used for training. Click on the “Code” below to see the detailed code.

Code

  1. import math
  2. import numpy as np
  3. import oneflow as flow
  4. device = flow.device("cuda")
  5. dtype = flow.float32
  6. # Create Tensors to hold input and outputs.
  7. x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
  8. y = flow.tensor(np.sin(x), device=device, dtype=dtype)
  9. # For this example, the output y is a linear function of (x, x^2, x^3), so
  10. # we can consider it as a linear layer neural network. Let's prepare the
  11. # tensor (x, x^2, x^3).
  12. xx = flow.cat(
  13. [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
  14. )
  15. # The Linear Module
  16. model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
  17. model.to(device)
  18. # Loss Function
  19. loss_fn = flow.nn.MSELoss(reduction="sum")
  20. loss_fn.to(device)
  21. # Optimizer
  22. optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
  23. # The Linear Train Graph
  24. class LinearTrainGraph(flow.nn.Graph):
  25. def __init__(self):
  26. super().__init__()
  27. self.model = model
  28. self.loss_fn = loss_fn
  29. self.add_optimizer(optimizer)
  30. def build(self, x, y):
  31. y_pred = self.model(x)
  32. loss = self.loss_fn(y_pred, y)
  33. loss.backward()
  34. return loss
  35. linear_graph = LinearTrainGraph()
  36. # linear_graph.debug()
  37. for t in range(2000):
  38. # Print loss.
  39. loss = linear_graph(xx, y)
  40. if t % 100 == 99:
  41. print(t, loss.numpy())
  42. linear_layer = model[0]
  43. print(
  44. f"Result: y = {linear_layer.bias.numpy()} + {linear_layer.weight[:, 0].numpy()} x + {linear_layer.weight[:, 1].numpy()} x^2 + {linear_layer.weight[:, 2].numpy()} x^3"
  45. )

Comparing to inference, there are only a few things that are unique to training:

  1. # Optimizer
  2. optimizer = flow.optim.SGD(model.parameters(), lr=1e-6) # (1)
  3. # The Linear Train Graph
  4. class LinearTrainGraph(flow.nn.Graph):
  5. def __init__(self):
  6. #...
  7. self.add_optimizer(optimizer) # (2)
  8. def build(self, x, y):
  9. #...
  10. loss.backward() # (3)
  11. #...
  1. Constructing the optimizer object, which is same to the training in Eager Mode introduced in Backpropagation and Optimizer.
  2. Call self.add_optimizer in Graph’s __init__ method to add the optimizer object constructed in the previous step to the Graph.
  3. Call backward in Graph’s build to trigger back propagation.

Debugging in Graph Mode

You can call print to show information about the Graph object.

  1. print(linear_graph)

The output is slightly different depending on whether the Graph object is called:

If you use print before the Graph object is called, the output is information about the network structure.

The output for print used before linear_graph is called is like this:

  1. (GRAPH:LinearTrainGraph_0:LinearTrainGraph): (
  2. (MODULE:model:Sequential()): (
  3. (MODULE:model.0:Linear(in_features=3, out_features=1, bias=True)): (
  4. (PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
  5. requires_grad=True)): ()
  6. (PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
  7. requires_grad=True)): ()
  8. )
  9. (MODULE:model.1:Flatten(start_dim=0, end_dim=1)): ()
  10. )
  11. (MODULE:loss_fn:MSELoss()): ()
  12. )

If you use print after the Graph object is called, in addition to the structure of the network, it will print inputs and outputs of the tensors, the output on the console is like this:

  1. (GRAPH:LinearTrainGraph_0:LinearTrainGraph): (
  2. (INPUT:_LinearTrainGraph_0-input_0:tensor(..., device='cuda:0', size=(2000, 3), dtype=oneflow.float32))
  3. (INPUT:_LinearTrainGraph_0-input_1:tensor(..., device='cuda:0', size=(2000,), dtype=oneflow.float32))
  4. (MODULE:model:Sequential()): (
  5. (INPUT:_model-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 3),
  6. dtype=oneflow.float32))
  7. (MODULE:model.0:Linear(in_features=3, out_features=1, bias=True)): (
  8. (INPUT:_model.0-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 3),
  9. dtype=oneflow.float32))
  10. (PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
  11. requires_grad=True)): ()
  12. (PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
  13. requires_grad=True)): ()
  14. (OUTPUT:_model.0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 1),
  15. dtype=oneflow.float32))
  16. )
  17. (MODULE:model.1:Flatten(start_dim=0, end_dim=1)): (
  18. (INPUT:_model.1-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 1),
  19. dtype=oneflow.float32))
  20. (OUTPUT:_model.1-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
  21. dtype=oneflow.float32))
  22. )
  23. (OUTPUT:_model-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
  24. dtype=oneflow.float32))
  25. )
  26. (MODULE:loss_fn:MSELoss()): (
  27. (INPUT:_loss_fn-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
  28. dtype=oneflow.float32))
  29. (INPUT:_loss_fn-input_1:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
  30. dtype=oneflow.float32))
  31. (OUTPUT:_loss_fn-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
  32. )
  33. (OUTPUT:_LinearTrainGraph_0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
  34. )

In addition, by calling the debug method of Graph objects, Graph’s debug mode is turned on.

OneFlow prints debug information when it compiles the computation graph. If the linear_graph.debug() is removed from the example code above, the output on the console is like this:

  1. Note that nn.Graph.debug() only print debug info on rank 0.
  2. (GRAPH:LinearTrainGraph_0:LinearTrainGraph) start building forward graph.
  3. (INPUT:_LinearTrainGraph_0-input_0:tensor(..., device='cuda:0', size=(20, 3), dtype=oneflow.float32))
  4. (INPUT:_LinearTrainGraph_0-input_1:tensor(..., device='cuda:0', size=(20,), dtype=oneflow.float32))
  5. (MODULE:model:Sequential())
  6. (INPUT:_model-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
  7. dtype=oneflow.float32))
  8. (MODULE:model.0:Linear(in_features=3, out_features=1, bias=True))
  9. (INPUT:_model.0-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
  10. dtype=oneflow.float32))
  11. (PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
  12. requires_grad=True))
  13. (PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
  14. requires_grad=True))
  15. (OUTPUT:_model.0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
  16. dtype=oneflow.float32))
  17. (MODULE:model.1:Flatten(start_dim=0, end_dim=1))
  18. (INPUT:_model.1-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
  19. dtype=oneflow.float32))
  20. (OUTPUT:_model.1-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
  21. (OUTPUT:_model-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
  22. (MODULE:loss_fn:MSELoss())
  23. (INPUT:_loss_fn-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
  24. (INPUT:_loss_fn-input_1:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
  25. (OUTPUT:_loss_fn-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
  26. (OUTPUT:_LinearTrainGraph_0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
  27. (GRAPH:LinearTrainGraph_0:LinearTrainGraph) end building forward graph.
  28. (GRAPH:LinearTrainGraph_0:LinearTrainGraph) start compiling and init graph runtime.
  29. (GRAPH:LinearTrainGraph_0:LinearTrainGraph) end compiling and init graph rumtime.

It displays the names of the layers in the computation graph and input/output tensor information, including shape, device information, data type, and so on.

The advantage of using debug is that the debug information is composed and printed at the same time, which makes it easy to find the problem if there is any error in the graph building process.

In addition to the methods described above, getting the parameters of the gradient during the training process, accessing to the learning rate and other functions are also under development and will come up soon.

Further Reading: Dynamic Graph vs. Static Graph

User-defined neural networks, are transformed by deep learning frameworks into computation graphs, like the example in Autograd:

  1. def loss(y_pred, y):
  2. return flow.sum(1/2*(y_pred-y)**2)
  3. x = flow.ones(1, 5) # 输入
  4. w = flow.randn(5, 3, requires_grad=True)
  5. b = flow.randn(1, 3, requires_grad=True)
  6. z = flow.matmul(x, w) + b
  7. y = flow.zeros(1, 3) # label
  8. l = loss(z,y)

The corresponding computation graph is:

computation graph

Dynamic Graph

The characteristic of dynamic graph is that it is defined by run.

The code above is run like this (Note: the figure below merges simple statements):

dynamic graph

Because the dynamic graph is defined by run, it is very flexible and easy to debug. You can modify the graph structure at any time and get results immediately. However, the deep learning framework can not get the complete graph information(which can be changed at any time and can never be considered as finished), it can not make full global optimization, so its performance is relatively poor.

Static Graph

Unlike a dynamic graph, a static graph defines a complete computation graph. It requires the user to declare all compute nodes before the framework starts running. This can be understood as the framework acting as a compiler between the user code and the computation graph that ultimately runs.

static graph

In the case of the OneFlow, the user’s code is first converted to a full computation graph and then run by the OneFlow Runtime module.

Static graph, which get the complete network first, then compile and run, can be optimized in a way that dynamic graph can not, so they have an advantage in performance. It is also easier to deploy across platforms after compiling the computation graph.

However, when the actual computation takes place in a static graph, it is no longer directly related to the user’s code, so debugging the static graph is not convenient.

The two approaches can be summarized as follows:

Dynamic GraphStatic Graph
Computation ModeEager ModeGraph Mode
ProsThe code is flexible and easy to debug.Good performance, easy to optimize and deploy.
ConsPoor performance and portability.Not easy to debug.

The Eager Mode in OneFlow is aligned with the PyTorch, which allows users familiar with the PyTorch to get their hands on easily with no more effert.

The Graph Mode in OneFlow is based on the object-oriented programming style, which allows developers familiar with eager programming style to benefit from static graph with minimal code changes.

Building neural network in OneFlow Eager Mode: Build Network

PyTorch version of polynomial fitting example: PyTorch: nn

Please activate JavaScript for write a comment in LiveRe

Taboola 후원링크Taboola 후원링크

당신이 좋아할만한 콘텐츠

팁과 요령한 웨이트리스는 자신이 음식을 제공하고 있는 노숙자가 실제로 누구인지 알고 나서 크게 충격을 받았습니다팁과 요령

돌아가기