Graph Generators

Gelly provides a collection of scalable graph generators. Each generator is

  • parallelizable, in order to create large datasets
  • scale-free, generating the same graph regardless of parallelism
  • thrifty, using as few operators as possibleGraph generators are configured using the builder pattern. The parallelism of generatoroperators can be set explicitly by calling setParallelism(parallelism). Lowering theparallelism will reduce the allocation of memory and network buffers.

Graph-specific configuration must be called first, then configuration common to allgenerators, and lastly the call to generate(). The following example configures agrid graph with two dimensions, configures the parallelism, and generates the graph.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. boolean wrapEndpoints = false;
  3. int parallelism = 4;
  4. Graph<LongValue, NullValue, NullValue> graph = new GridGraph(env)
  5. .addDimension(2, wrapEndpoints)
  6. .addDimension(4, wrapEndpoints)
  7. .setParallelism(parallelism)
  8. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.GridGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. wrapEndpoints = false
  5. val parallelism = 4
  6. val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).setParallelism(parallelism).generate()

Circulant Graph

A circulant graph is anoriented graph configuredwith one or more contiguous ranges of offsets. Edges connect integer vertex IDswhose difference equals a configured offset. The circulant graph with no offsetsis the empty graph and the graph with the maximum range is thecomplete graph.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue, NullValue, NullValue> graph = new CirculantGraph(env, vertexCount)
  4. .addRange(1, 2)
  5. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.CirculantGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new CirculantGraph(env.getJavaEnv, vertexCount).addRange(1, 2).generate()

Complete Graph

An undirected graph connecting every distinct pair of vertices.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue, NullValue, NullValue> graph = new CompleteGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.CompleteGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new CompleteGraph(env.getJavaEnv, vertexCount).generate()

Graph Generators - 图1

Cycle Graph

An undirected graph where the set of edges form a single cycle by connectingeach vertex to two adjacent vertices in a chained loop.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue, NullValue, NullValue> graph = new CycleGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.CycleGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new CycleGraph(env.getJavaEnv, vertexCount).generate()

Graph Generators - 图2

Echo Graph

An echo graph is acirculant graph with n vertices defined by the width of asingle range of offsets centered at n/2. A vertex is connected to ‘far’vertices, which connect to ‘near’ vertices, which connect to ‘far’ vertices, ….

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. long vertexDegree = 2;
  4. Graph<LongValue, NullValue, NullValue> graph = new EchoGraph(env, vertexCount, vertexDegree)
  5. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.EchoGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val vertexDegree = 2
  6. val graph = new EchoGraph(env.getJavaEnv, vertexCount, vertexDegree).generate()

Empty Graph

A graph containing no edges.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue, NullValue, NullValue> graph = new EmptyGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.EmptyGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new EmptyGraph(env.getJavaEnv, vertexCount).generate()

Graph Generators - 图3

Grid Graph

An undirected graph connecting vertices in a regular tiling in one or more dimensions.Each dimension is configured separately. When the dimension size is at least three theendpoints are optionally connected by setting wrapEndpoints. Changing the followingexample to addDimension(4, true) would connect 0 to 3 and 4 to 7.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. boolean wrapEndpoints = false;
  3. Graph<LongValue, NullValue, NullValue> graph = new GridGraph(env)
  4. .addDimension(2, wrapEndpoints)
  5. .addDimension(4, wrapEndpoints)
  6. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.GridGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val wrapEndpoints = false
  5. val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).generate()

Graph Generators - 图4

Hypercube Graph

An undirected graph where edges form an n-dimensional hypercube. Each vertexin a hypercube connects to one other vertex in each dimension.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long dimensions = 3;
  3. Graph<LongValue, NullValue, NullValue> graph = new HypercubeGraph(env, dimensions)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.HypercubeGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val dimensions = 3
  5. val graph = new HypercubeGraph(env.getJavaEnv, dimensions).generate()

Graph Generators - 图5

Path Graph

An undirected graph where the set of edges form a single path by connectingtwo endpoint vertices with degree 1 and all midpoint vertices with degree2. A path graph can be formed by removing a single edge from a cycle graph.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5
  3. Graph<LongValue, NullValue, NullValue> graph = new PathGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.PathGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new PathGraph(env.getJavaEnv, vertexCount).generate()

Graph Generators - 图6

RMat Graph

A directed power-law multigraph generated using theRecursive Matrix (R-Mat) model.

RMat is a stochastic generator configured with a source of randomness implementing theRandomGenerableFactory interface. Provided implementations are JDKRandomGeneratorFactoryand MersenneTwisterFactory. These generate an initial sequence of random values which arethen used as seeds for generating the edges.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
  3. int vertexCount = 1 << scale;
  4. int edgeCount = edgeFactor * vertexCount;
  5. Graph<LongValue, NullValue, NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
  6. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.RMatGraph
  3. val env = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 1 << scale
  5. val edgeCount = edgeFactor * vertexCount
  6. val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).generate()

The default RMat constants can be overridden as shown in the following example.The constants define the interdependence of bits from each generated edge’s sourceand target labels. The RMat noise can be enabled and progressively perturbs theconstants while generating each edge.

The RMat generator can be configured to produce a simple graph by removing self-loopsand duplicate edges. Symmetrization is performed either by a “clip-and-flip” throwing awaythe half matrix above the diagonal or a full “flip” preserving and mirroring all edges.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
  3. int vertexCount = 1 << scale;
  4. int edgeCount = edgeFactor * vertexCount;
  5. boolean clipAndFlip = false;
  6. Graph<LongValue, NullValue, NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
  7. .setConstants(0.57f, 0.19f, 0.19f)
  8. .setNoise(true, 0.10f)
  9. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.RMatGraph
  3. val env = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 1 << scale
  5. val edgeCount = edgeFactor * vertexCount
  6. clipAndFlip = false
  7. val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).setConstants(0.57f, 0.19f, 0.19f).setNoise(true, 0.10f).generate()

Singleton Edge Graph

An undirected graph containing isolated two-paths where every vertex has degree1.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexPairCount = 4
  3. // note: configured with the number of vertex pairs
  4. Graph<LongValue, NullValue, NullValue> graph = new SingletonEdgeGraph(env, vertexPairCount)
  5. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.SingletonEdgeGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexPairCount = 4
  5. // note: configured with the number of vertex pairs
  6. val graph = new SingletonEdgeGraph(env.getJavaEnv, vertexPairCount).generate()

Graph Generators - 图7

Star Graph

An undirected graph containing a single central vertex connected to all other leaf vertices.

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 6;
  3. Graph<LongValue, NullValue, NullValue> graph = new StarGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.StarGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 6
  5. val graph = new StarGraph(env.getJavaEnv, vertexCount).generate()

Graph Generators - 图8