t-SNE’s Data Visualization

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a data-visualization tool created by Laurens van der Maaten at Delft University of Technology.

While it can be used for any data, t-SNE (pronounced Tee-Snee) is only really meaningful with labeled data, which clarify how the input is clustering. Below, you can see the kind of graphic you can generate in DL4J with t-SNE working on MNIST data.

Alt text

Look closely and you can see the numerals clustered near their likes, alongside the dots.

Here’s how t-SNE appears in Deeplearning4j code.

  1. public class TSNEStandardExample {
  2. private static Logger log = LoggerFactory.getLogger(TSNEStandardExample.class);
  3. public static void main(String[] args) throws Exception {
  4. //STEP 1: Initialization
  5. int iterations = 100;
  6. //create an n-dimensional array of doubles
  7. DataTypeUtil.setDTypeForContext(DataBuffer.Type.DOUBLE);
  8. List<String> cacheList = new ArrayList<>(); //cacheList is a dynamic array of strings used to hold all words
  9. //STEP 2: Turn text input into a list of words
  10. log.info("Load & Vectorize data....");
  11. File wordFile = new ClassPathResource("words.txt").getFile(); //Open the file
  12. //Get the data of all unique word vectors
  13. Pair<InMemoryLookupTable,VocabCache> vectors = WordVectorSerializer.loadTxt(wordFile);
  14. VocabCache cache = vectors.getSecond();
  15. INDArray weights = vectors.getFirst().getSyn0(); //seperate weights of unique words into their own list
  16. for(int i = 0; i < cache.numWords(); i++) //seperate strings of words into their own list
  17. cacheList.add(cache.wordAtIndex(i));
  18. //STEP 3: build a dual-tree tsne to use later
  19. log.info("Build model....");
  20. BarnesHutTsne tsne = new BarnesHutTsne.Builder()
  21. .setMaxIter(iterations).theta(0.5)
  22. .normalize(false)
  23. .learningRate(500)
  24. .useAdaGrad(false)
  25. // .usePca(false)
  26. .build();
  27. //STEP 4: establish the tsne values and save them to a file
  28. log.info("Store TSNE Coordinates for Plotting....");
  29. String outputFile = "target/archive-tmp/tsne-standard-coords.csv";
  30. (new File(outputFile)).getParentFile().mkdirs();
  31. tsne.plot(weights,2,cacheList,outputFile);
  32. //This tsne will use the weights of the vectors as its matrix, have two dimensions, use the words strings as
  33. //labels, and be written to the outputFile created on the previous line
  34. }
  35. }

Here is an image of the tsne-standard-coords.csv file plotted using gnuplot.

Tsne data plot