|Vassilina Nikoulina, Maxat Tezekbayev, Nuradil Kozhakhmet, Madina Babazhonova, Matthias Gallé, Zhenisbek Assylbekov|
|Published on arXiv.org, 2 March, 2021|
There is an ongoing debate in the NLP community whether modern language models contain linguistic knowledge, recovered through so-called probes. In this paper, we study whether high probing scores are a necessary condition for good performance of modern language models, which we call the rediscovery hypothesis. In the first place, we show that language models that are significantly compressed but perform well on their pre-training objectives retain good scores when probed for linguistic structures. This result leads to the second contribution of this paper: an information-theoretic framework that relates pre-training objectives with linguistic information. This framework also provides a metric to measure the dependence strength. We reinforce our analytical results with various experiments, both on synthetic and on real tasks.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.