I looked at your source code and read the paper, and I want to ask, did you use the original BERT for initialization and not the medical BERT?