shows the word-based similarity of Reddit subs.


Technical Stuff:

The data source is hosted on Google's Big Query and includes all Reddit post descriptions, for the month of May 2016.
I created an app in Go (Golang), which uses tf-idf to reduce the influence of common, but unimportant words. The 300 most common title words of each sub are identified, ranked and normalised. The 59 subs that are most similar to the selected sub, are used to generate a 60x60 similarity matrix. That matrix is then displayed using DependencyWheel. Subs that either have less than 20 posts, or which have and average post upvote rating below 1.5, are excluded from the results.