GraalVM demos: Performance Examples for Java

The GraalVM compiler achieves excellent performance for modern workloadssuch as Scala or usage of the Java Streams API. The examples belowdemonstrate this.

Prerequisites

Running the examples

Let us use a simple example based on the Streams APIto demonstrate performance of the GraalVM compiler. This example counts the number of uppercase characters in a body of text. To simulate a large load, the same sentence is processed 10 million times:

  1. Save the following code snippet to a file named CountUppercase.java:

Java Performance Examples - 图1

  1. // COMPILE-CMD: javac {file}
  2. // RUN-CMD: java -Diterations=2 {file} In 2017 I would like to run ALL languages in one VM.
  3. // RUN-CMD: java -Diterations=2 -XX:-UseJVMCICompiler {file} In 2017 I would like to run ALL languages in one VM.
  4. // BEGIN-SNIPPET
  5. publicclassCountUppercase{
  6. staticfinalint ITERATIONS =Math.max(Integer.getInteger("iterations",1),1);
  7. publicstaticvoid main(String[] args){
  8. String sentence =String.join(" ", args);
  9. for(int iter =0; iter < ITERATIONS; iter++){
  10. if(ITERATIONS !=1)System.out.println("-- iteration "+(iter +1)+" --");
  11. long total =0, start =System.currentTimeMillis(),last= start;
  12. for(int i =1; i <10_000_000; i++){
  13. total += sentence.chars().filter(Character::isUpperCase).count();
  14. if(i %1_000_000==0){
  15. long now =System.currentTimeMillis();
  16. System.out.printf("%d (%d ms)%n", i /1_000_000, now -last);
  17. last= now;
  18. }
  19. }
  20. System.out.printf("total: %d (%d ms)%n", total,System.currentTimeMillis()- start);
  21. }
  22. }
  23. }
  24. // END-SNIPPET
  1. Compile it and run as follows:
  1. $ javac CountUppercase.java
  2. $ java CountUppercaseIn2019 I would like to run ALL languages in one VM.
  3. 1(389 ms)
  4. 2(235 ms)
  5. 3(216 ms)
  6. 4(77 ms)
  7. 5(81 ms)
  8. 6(79 ms)
  9. 7(85 ms)
  10. 8(80 ms)
  11. 9(78 ms)
  12. total:69999993(1408 ms)

The warmup time depends on numerous factors like the source code or howmany cores a machine has. If the performance profile of CountUppercase on yourmachine does not match the above, run it for more iterations by adding-Diterations=N just after java for some N greater than 1.

  1. Add the -Dgraal.PrintCompilation=true option to see statistics for the compilations:
  1. $ java -Dgraal.PrintCompilation=trueCountUppercaseIn2019 I would like to run ALL languages in one VM.

This option prints a line after each compilation that shows the methodcompiled, time taken, bytecodes processed (including inlined methods), sizeof machine code produced, and amount of memory allocated during compilation.

  1. Use the -XX:-UseJVMCICompiler option to disable the GraalVM compiler anduse the native top tier compiler in the VM to compare performance, as follows:
  1. $ java -XX:-UseJVMCICompilerCountUppercaseIn2019 I would like to run ALL languages in one VM.
  2. 1(602 ms)
  3. 2(443 ms)
  4. 3(429 ms)
  5. 4(423 ms)
  6. 5(418 ms)
  7. 6(432 ms)
  8. 7(454 ms)
  9. 8(415 ms)
  10. 9(407 ms)
  11. total:69999993(4443 ms)

The preceding example demonstrates the benefits of partial escape analysis (PEA)and advanced inlining, which combine to significantly reduce heap allocation.The results were obtained using GraalVM Enterprise Edition.

The GraalVM Community Edition still has good performance compared to the native top tiercompiler as shown below. You can simulate the Community Edition on the Enterprise Editionby adding the option -Dgraal.CompilerConfiguration=community.

Sunflow is an open source rendering engine.The following example is a simplified version of code at the core of theSunflow engine. It performs calculations to blend various values for a point oflight in a rendered scene.

  1. Save the following code snippet to a file named Blender.java:

Java Performance Examples - 图2

  1. // COMPILE-CMD: javac {file}
  2. // RUN-CMD: java {file}
  3. // RUN-CMD: java -XX:-UseJVMCICompiler {file}
  4. // BEGIN-SNIPPET
  5. publicclassBlender{
  6. privatestaticclassColor{
  7. double r, g, b;
  8. privateColor(double r,double g,double b){
  9. this.r = r;
  10. this.g = g;
  11. this.b = b;
  12. }
  13. publicstaticColor black(){
  14. returnnewColor(0,0,0);
  15. }
  16. publicvoid add(Color other){
  17. r += other.r;
  18. g += other.g;
  19. b += other.b;
  20. }
  21. publicvoid add(double nr,double ng,double nb){
  22. r += nr;
  23. g += ng;
  24. b += nb;
  25. }
  26. publicvoid multiply(double factor){
  27. r *= factor;
  28. g *= factor;
  29. b *= factor;
  30. }
  31. }
  32. privatestaticfinalColor[][][] colors =newColor[100][100][100];
  33. publicstaticvoid main(String[] args){
  34. for(int j =0; j <10; j++){
  35. long t =System.nanoTime();
  36. for(int i =0; i <100; i++){
  37. initialize(newColor(j /20,0,1));
  38. }
  39. long d =System.nanoTime()- t;
  40. System.out.println(d /1_000_000+" ms");
  41. }
  42. }
  43. privatestaticvoid initialize(Color id){
  44. for(int x =0; x < colors.length; x++){
  45. Color[][] plane = colors[x];
  46. for(int y =0; y < plane.length; y++){
  47. Color[] row = plane[y];
  48. for(int z =0; z < row.length; z++){
  49. Color color =newColor(x, y, z);
  50. color.add(id);
  51. if((color.r + color.g + color.b)%42==0){
  52. // PEA only allocates a color object here.
  53. row[z]= color;
  54. }else{
  55. // In this branch the color object is not allocated at all.
  56. }
  57. }
  58. }
  59. }
  60. }
  61. }
  62. // END-SNIPPET
  1. Compile it and run as follows:
  1. $ javac Blender.java
  2. $ java Blender
  3. 2477 ms
  4. 910 ms
  5. 857 ms
  6. 815 ms
  7. 813 ms
  8. 821 ms
  9. 819 ms
  10. 832 ms
  11. 819 ms
  12. 839 ms

If you would like to check how it would behave when using the GraalVM CE, use the following configuration flag:

  1. java -Dgraal.CompilerConfiguration=community Blender
  2. 1127 ms
  3. 902 ms
  4. 888 ms
  5. 858 ms
  6. 820 ms
  7. 860 ms
  8. 855 ms
  9. 864 ms
  10. 899 ms
  11. 899 ms
  1. again, use the -XX:-UseJVMCICompiler option to disable the GraalVM compiler and run with the normal HotSpot’s jit:
  1. $ java -XX:-UseJVMCICompilerBlender
  2. 2214 ms
  3. 1666 ms
  4. 1667 ms
  5. 1438 ms
  6. 1436 ms
  7. 1458 ms
  8. 1452 ms
  9. 1528 ms
  10. 1557 ms
  11. 1474 ms

The improvement compared to not using the GraalVM compiler comes from the partial escape analysis moving the allocation of colorin initialize down to the point where it is stored into colors (i.e., thepoint at which it escapes).