Topics

Debiasing large pretrained language models using distributional control

Published by NAVER LABS Europe at 30 April 2021

Hady Elsahar, Marc Dymetman, Muhammad Khalifa

2021

Our novel framework for controlled natural language generation, Generation with Distributional Control, achieves great generality on the types of constraints that can be imposed and has a large potential to remedy the problem of bias in language models.

[Paper] [Code]

For text generation tasks, large pretrained language models (PLMs)—like GPT-2, GPT-3, T5 and BART—are now dominant, producing text that can sometimes be confused with human writing (1, 2). However, these models are mirrors of their training data, which means that they inherit any biases that exist within the datasets they learn from.

Indeed, social bias against certain minority groups has been observed in these models (3, 4). More specifically, when prompted with text about a specific group, the sentiment of the models’ output appears to be biased against demographics such as “women” and “black”, as opposed to “men” and “white”. Other forms of bias include offensive language and general toxicity, both of which are profuse in the non-curated, web-crawled training data.

Now that you see the problem with using large language models for generation, you may wonder why we don’t just mitigate this bias using existing methods (like controlled natural language generation, NLG). Well, the main problem with existing methods is that they operate on individual samples, whereas bias is a collective problem associated with the distribution of a certain feature (e.g. sentiment) across a collection of samples. Existing controlled NLG methods are therefore too blinkered to be useful for this task and only allow desired sentiment to be imposed in a blanket way, i.e. on all model outputs. Clearly, this would just introduce another kind of bias instead of solving the main problem.

Our approach: Generation with Distribution Control (GDC)

Enter our proposed approach (5), GDC, which we have developed with the aim of overcoming these issues. GDC is a framework that subsumes the existing approaches for controlled NLG. In other words, it works on individual samples (we refer to this as pointwise control) but additionally allows for what we call distributional control (i.e. enabling the sentiment across a collection of samples to be tuned).

Formally, for a given binary feature (or set of features), GDC enables the mean value over samples drawn from the model to be tuned to a value of our choosing. This gives flexibility as, if we set the mean to 1.0, we are imposing the constraint over each individual sample (which is equivalent to the pointwise case). By setting the mean to a smaller value, we obtain distributional control.

Female bios, please!

Let’s motivate our approach with an experiment. Say, for example, you have fine-tuned GPT-2 to generate biographies. You might begin by sampling biographies from your model and observing its outputs. Alas, you notice that your model is mostly generating biographies about men, with only 7% about women! The reason for this is that your model saw mostly male biographies during training and, as a result, has become biased towards this kind of output.

For GDC, balancing male and female biographies is a distributional constraint. More formally, our version of the GPT-2 model satisfies:

where iff x is a female biography. is the expected value of this feature over generations from GPT-2, observed to be equal to 0.07, i.e. GPT-2 generates biographies about women only 7% of the time.

So, to obtain a debiased model, we define constraints over the original GPT-2 distribution in terms of expectations of those features. In particular, we aim to generate biographies about women 50% of the time, i.e.:

As you learn more about GDC, perhaps you become a bit greedier in your requirements and decide that you’d like the model outputs to focus exclusively on scientists. That is, you want to also satisfy:

where iff x is a biography of a scientist.

Taken together, the two equations describe our desired constraints for this problem. We refer to this imposing of pointwise as well as distributional constraints as a hybrid specification.

From constraints to an energy-based model

We can add as many constraints as we want (although there are some practical limitations on that, details of which can be found in our discussion with the reviewers of our paper). Consider the ‘manifold’ of all distributions satisfying these linear constraints (). It can be proven (6) that there is a unique ‘projection’ from an arbitrary distribution onto this manifold which satisfies two important properties. First, has a minimum KL-divergence from GPT-2 among all distributions satisfying the constraints. This means that deviation from our original pretrained model ( = GPT-2) is limited, so we aren’t wasting any pretrained knowledge and, consequently, we’re unlikely to encounter the degeneration and disfluency problems (7) that can be caused by steering away from the original model (8). A second property is that can be represented in an exponential family form. This implies that our representation, , is an energy-based model (EBM) or, in other words, an unnormalized probability distribution with an explicit parametric form.

Video 1. Sample of text generated with different policies with a pointwise constraint. Constraint satisfaction is shown in green, while degenerations are shown in red. Our approach aims at learning the distribution in the constraints manifold (i.e. the manifold of distributions satisfying the imposed constraints), such that is at a minimal distance from our pretrained model, . On the other hand, RL-based techniques produce a distribution in , but which is distant from the original model, causing severe degeneration issues.

Leveraging the EBM to create an autoregressive model

So far, our path started from a set of desired constraints and has led us to an EBM. Sampling text sequences from this EBM () should give us access to the sought-after projection distribution . However, there is a problem: the EBM is not locally normalized. In other words, we can’t sample from it directly in a token-by-token fashion in the way that we can with autoregressive models. We could use sampling techniques to work around that (e.g. Markov chain Monte Carlo methods), but assessing the convergence of such techniques is not straightforward, and sampling is often slow.

Instead, we develop a KL-adaptive Deterministic Policy Gradient (KL-DPG) algorithm (5), an adaptation of the Distributional Policy Gradients algorithm (9), that circumvents these issues. DPG aims to minimize the cross-entropy between a target distribution (in our case, the projection ) and a trained autoregressive policy with only access to the unnormalized energy function, . The KL prefix relates to the fact that an estimate of the KL-divergence between the trained policy and the target distribution is used to both guide and assess the convergence towards . In addition, KL-adaptive DPG is particularly effective with rare constraints. (5)

A glimpse at our experiments: pointwise, distributional and hybrid cases

This post is not the place to detail all our experiments (which are many!). Our goal here is to instead give a high-level overview that shows the promise of our approach.

From the discussion so far, it’s clear that we can deal with 3 possible combinations of constraints: pointwise only, distributional only, and hybrid. By running a set of experiments using GDC with each of these, we show that DPG outperforms strong baselines in terms of deviation from the original pretrained language model (PLM), text quality, and diversity.

Figure 1: Evaluation of policies obtained from GDC against three reinforcement learning baselines REINFORCE, REINFORCE and Ziegler. Blue: Our approach, generation with distributional control (GDC). Orange: REINFORCE. Green: REINFORCE. Red: Ziegler. The latter are three reinforcement learning baselines trained to maximize the constraint satisfaction as a reward. Evaluation metrics, over 30,000 iterative steps, include (a) constraint satisfaction (↑ better), (b) KL divergence from (↓ better), (c) Self-BLEU-5 (↓ better) and (d) Distinct-1 (↑ better).

Pointwise experiments: working with individual samples

For the pointwise case, we experiment with three different control settings: single-word constraint (which imposes the presence of a specific word in the input); word-list constraint (which controls for the presence of at least one word from a list, useful for topic control); and classifier-based constraint (which relies on a signal from a classifier, such as a sentiment classifier). From our results, shown in Figure 2(a–d), two observations can be made. First, GDC maintains a small KL-divergence from the original PLM, . Second, GDC outperforms all other baselines in terms of both corpus-level diversity (tested with the Self-BLEU metric) and sequence-level diversity (tested with the Distinct-1 metric), as shown in Figure 2. Another crucial convergence metric is the distance from the desired in terms of KL divergence (see Figure 3), which shows how close our fine-tuned model is to approaching the optimal desired distribution (i.e. whether our objective is being met). GDC shows superiority by achieving faster and more steady convergence towards , outperforming other baselines.

Figure 2: Graph showing the KL divergence of the trained policy from the optimal distribution (p) during training (↓ better). Blue: Our approach, generation with distributional control (GDC). Orange: REINFORCE. Green: REINFORCE. Red: Ziegler. The latter are three reinforcement learning baselines trained to maximize the constraint satisfaction as a reward.

Distributional experiments: debiasing gender representations

Remember our discussion about female biographies? We can now use the same setting to impose distributional constraints, or combinations of distributional and pointwise (ie. hybrid) constraints, to reduce the bias we saw earlier. We start with a single distributional constraint to increase the representation of female biographies. Figure 2 shows that using GDC increases the number of female biographies (from 7.4% to 36.7%).

How about a collection of distributional constraints? In the second distributional experiment, we aim to generate a pre-specified proportion of art, science, business, and sports biographies (40%, 40%, 10% and 10%, respectively). Figure 2 shows how GDC approaches the desired imposed constraints regardless of the feature expectation (i.e. whether it needs to be increased or decreased), showing the flexibility of our framework in satisfying distributional constraints.

Hybrid experiments: combining the power of distributional and pointwise constraints

Now for the final and most important setting, where we impose pointwise and distributional constraints at the same time to create a hybrid constraint. Here, the distributional constraint is still tuned to generate biographies of which half are about women, but this time the pointwise constraints specify that all generated biographies (male or female) should correspond with a certain profession (or topic). The results in Figure 3 show that GDC is flexible enough to almost satisfy both types of constraints.

A fully supervised attempt to satisfy constraints with minimal divergence

To investigate whether it’s possible to fully satisfy constraints while maintaining a minimal distributional divergence from the original PLM, we conduct a fully supervised experiment where we fine-tune GPT-2 on samples containing a specific word (5). Our goal is to verify whether fine tuning in this simple supervised setting can obtain 100% constraint satisfaction, but without overfitting (which implies a large divergence from the PLM). Unfortunately, we’re unable to reach higher constraint satisfaction without overfitting.

GDC discussion, and future work

Our approach is a significant step towards controlling learning models under a unified framework. However, there is still some work to do before we’re able to fully satisfy the desired constraints in pointwise or distributional settings. Reinforcement-learning-based baselines show better constraint satisfaction than our approach, but they suffer from degeneration and low diversity.

Although our supervision experiment doesn’t resolve this issue, it at least sheds light on the possibility that the GPT-2 architecture has difficulty fine-tuning over some constraints (such as containing a given word somewhere in its output). Another way to look at this is that constraint satisfaction and distance from the original PLM may represent two competing objectives, where improving one negatively affects the other. This suggests that we still face some trade-off between linguistic quality and constraint satisfaction. Addressing this limitation is a topic for further research.

References

Language models are unsupervised multitask learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.
Language models are few-shot learners. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nellakantan et al. Advances in Neural Information Processing Systems 33 (NIPS 2020), virtual event, 6–12 December 2017.
The woman worked as a babysitter: on biases in language generation. Emily Sheng, Kai-Wei Chang, Premkumar Natarajan and Nanyun Peng. Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019.
Towards controllable biases in language generation. Emily Sheng, Kai-Wei Chang, Premkumar Natarajan and Nanyun Peng. Conference on Empirical Method in Natural Language Processing (EMNLP), virtual event, 16–20 November 2020.
A distributional approach to controlled text generation. Muhammad Khalifa, Hady Elsahar and Marc Dymetman. Proceedings of the 9th International Conference on Learning Representations (ICLR), virtual conference, 4–7 May 2021.
Information theory and statistics: a tutorial. Imre Csiszár and Paul C. Shields. Communications and Information Theory. Now Publishers Inc., 2004.
The curious case of neural text degeneration. Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes and Yejin Choi. Proceedings of the 8th International Conference on Learning Representations (ICLR), virtual conference, 26 April–1 May 2020.
Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner and Douglas Eck. Proceedings of the 34th International Conference on Machine Learning (ICML ’17), Sydney, Australia, 11–15 August 2017.
Distributional reinforcement learning for energy-based sequential models. Tetiana Parshakova, Jean-Marc Andreoli and Marc Dymetman. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, British Columbia, Canada, 8–14 December, 2019.

Debiasing large pretrained language models using distributional control

Our approach: Generation with Distribution Control (GDC)

Female bios, please!

From constraints to an energy-based model

Leveraging the EBM to create an autoregressive model

A glimpse at our experiments: pointwise, distributional and hybrid cases

Pointwise experiments: working with individual samples

Distributional experiments: debiasing gender representations

Hybrid experiments: combining the power of distributional and pointwise constraints

A fully supervised attempt to satisfy constraints with minimal divergence

GDC discussion, and future work

References

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Topics

Debiasing large pretrained language models using distributional control

Our approach: Generation with Distribution Control (GDC)

Female bios, please!

From constraints to an energy-based model

Leveraging the EBM to create an autoregressive model

A glimpse at our experiments: pointwise, distributional and hybrid cases

Pointwise experiments: working with individual samples

Distributional experiments: debiasing gender representations

Hybrid experiments: combining the power of distributional and pointwise constraints

A fully supervised attempt to satisfy constraints with minimal divergence

GDC discussion, and future work

References

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings