In a previous post I described the TSNE algorithm and how it can be used to map high dimensional data sets onto a 2 or 3 dimensional graph. There are many implementations of the TSNE algorithm and it is important to be aware of the issues associated with each of them before using them. In this post I will discuss some of these issues and how you can avoid them.

TSNE is a computationally and memory expensive algorithm. Depending on which language it is implemented in and how it is optimized, the algorithm can either map only 10,000 observations or 100,000+ observations.

In R, the TSNE algorithm is available in two packages: TSNE and RTSNE. TSNE is a R implementation of the algorithm and can take an incredibly long time to map large data sets. RTSNE is a wrapper around the C++ implementation of the algorithm and can handle larger data sets with low computation time. The graph below compares the various implementations and shows that RTSNE is 35 times faster than the TSNE implementation.

In python, the TSNE implementation is available in the sci-kit learn library package. In the graph below, the python sci-kit learn implementation is shown to be competitive with RTSNE. However, the python implementation tends to have memory issues with data sets that contain 10,000+ observations. Because of these issues the R implementation should be preferred when performing TSNE.

If you want to do a quick exploration of the data, I would recommend doing a stratified sub sampling of 10,000 to 20,000 observations which can generate relatively good projections with low calculation time. If the data set is high dimensional, doing principal component analysis is recommended because otherwise the curse of dimensionality can be an issue. TSNE makes the assumption of local linearity which might not hold in high dimensions where the manifold may be varying and PCA can help alleviate this issue by reducing the dimensionality of the data. In RTSNE you can set the parameter PCA = True, and the algorithm will automatically perform PCA before performing TSNE.

Another issue you might encounter is getting different graphs on multiple runs of TSNE. The Kullback Leibler divergence function that is being minimized using gradient descent is a non-convex function with multiple local minimums and depending on the initial values of $y_{i}$ the algorithm picks, the optimization algorithm leads to different local minimums resulting in different graphs. I suggest using set.seed() in R so that that graphs are reproducible.

Summary: Use RTSNE in R. Use a pythom wrapper for the C++ implementation if you want to run it in python.