Chainer implementation of “BERT: Pre-training | Dark Hacker News