Please see the pdf document first
Also during implementation I realised that case sensitivity comes into play. Also the results should be order by count and also word. I attempted to implement a parallel processing but this took 6 seconds, much longer. It exists as a spike in the repo.