A plain text corpus consisting of 7948 articles from BMC journals mentioning the wordstem 'phylogen*' somewhere in the full text of the paper. (All content is made available under CC BY http://creativecommons.org/licenses/by/3.0/ and is copyright of the original article authors)