Over the last years, models in NLP have become much more powerful, driven by advances in transfer learning. A consequence of this drastic increase in performance is that existing benchmarks have been left behind. Recent models "have outpaced the benchmarks to test for them" (AI Index Report 2021), quickly reaching
Domain adaptation methods typically seek to identify features that are shared between the domains or learn representations that are general enough to be useful for both domains. This post discusses a complementary approach to domain adaptation that selects data that is useful for training the model.