Towards efficient and effective unlearning of large language models for recommendation

Hangyu WANG, Jianghao LIN, Bo CHEN, Yang YANG, Ruiming TANG, Weinan ZHANG, Yong YU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (3) : 193327.

PDF(406 KB)
Front. Comput. Sci. All Journals
PDF(406 KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (3) : 193327. DOI: 10.1007/s11704-024-40044-2
Artificial Intelligence
LETTER

Towards efficient and effective unlearning of large language models for recommendation

Author information +
History +

Graphical abstract

Cite this article

Download citation ▾
Hangyu WANG, Jianghao LIN, Bo CHEN, Yang YANG, Ruiming TANG, Weinan ZHANG, Yong YU. Towards efficient and effective unlearning of large language models for recommendation. Front. Comput. Sci., 2025, 19(3): 193327 https://doi.org/10.1007/s11704-024-40044-2

1 Introduction

Large Language Models (LLMs) possess massive parameters and are trained on vast datasets, demonstrating exceptional proficiency in various tasks. The remarkable advancements in LLMs also inspire the exploration of leveraging LLMs as recommenders (LLMRec), whose effectiveness stems from extensive open-world knowledge and reasoning ability in LLMs [1]. LLMRec obtains the recommendation ability through instruction tuning on the user interaction data. But in many cases, it is also crucial for LLMRec to forget specific user data, which is referred to as recommendation unlearning [2], as shown in Fig.1.
Fig.1 The process of recommendation unlearning

Full size|PPT slide

The necessity of recommendation unlearning mainly arises from two aspects. 1) Privacy. According to privacy legislation, recommenders are obligated to erase the sensitive data upon user requests in order to protect user privacy. 2) Utility. Noisy data or polluted data can severely degrade recommendation performance. Once these data are identified, recommenders need to forget them to regain the utility [3]. However, recommendation unlearning in the era of LLMs brings a new challenge of inefficiency to existing approaches. Current unlearning methods all require updating all parameters of the model, which is expensive and time consuming given the billions of parameters in LLMs [4,5]. Some studies [6] explore unlearning in LLMs, but they use a gradient ascent based approach to unlearn knowledge, which breaks the classification boundary and affects model utility on normal data.
To this end, we propose E2URec, Efficient and Effective Unlearning for LLMRec. Our main contributions are summarized as follows. 1) We study the unlearning problem for LLMRec. Our proposed E2URec outperforms existing approaches in terms of both efficiency and effectiveness. 2) For efficiency, we propose to add a lightweight low-rank adaption (LoRA) module to the original LLM. Only the LoRA parameters will be updated in the unlearning, while the parameters of the LLM are frozen. 3) For effectiveness, we design a novel forgetting teacher and a remembering teacher to guide the unlearned model, so that the unlearned model can forget data and maintain the recommendation performance respectively.

2 LLMs as recommenders

LLMRec aims to utilize LLM to predict whether a item will be clicked by a user. We denote the recommendation dataset as D={(xi,yi)}i=1N with N samples, where xi is the features of i-th sample, and yi is the label. Features xi is converted into textual sentence xitext via hard prompt templates. Similarly, the label yi{1,0} (click or not) is converted into corresponding answer words yitext{"Yes","No"}.
The causal language modeling objective is used to optimize LLM on dataset D, by minimizing the negative log-likelihood of generating yitext conditioned on input xitext:
Lpred(D)=(xitext,yitext)Dt=1|y|log(P(yi,ttext|xitext,yi,<ttext)),
where yi,ttext is the tth token of yitext, and yi,<ttext is the tokens before yi,ttext.

3 Methodologies

Formally, suppose that on the recommendation dataset D, we have trained a LLMRec that is called the original model. Subsequently, a request for deletion is received, whereby the data to be removed is denoted as forgotten data Df, and the remaining data is retained data Dr=DDf. Then, the goal of unlearning is to learn a unlearned model Mu that forgets the information about Df without hurting the recommendation performance. The unlearned model is initialized with the parameters of the original model.

3.1 Parameter-efficient unlearning

Considering the billions of parameters of LLM, updating all its parameters for forgetting is resource-intensive. Inspired by recent advances in parameter-efficient finetuning, we propose to insert the lightweight LoRA modules into LLM, as shown in Fig.2(b). The LoRA modules add pairs of rank decomposition weight matrices to the original parameters of the LLM while just introducing a few parameters. In this way, we model the unlearned model as Mu(;ϕ,θ):=Mu(;θ), where ϕ is the parameters of LLM and θ is the LoRA parameters. During the unlearning process, we only need to update θ, while ϕ remains frozen. This greatly reduces the computing resource and time.
Fig.2 The framework of our proposed E2URec

Full size|PPT slide

3.2 Unlearning with forgetting/remembering teachers

We aim to achieve unlearning by using two teachers, as depicted in Fig.2(b). To remove knowledge, we update the unlearned model Mu to produce distributions similar to the forgetting teacher Mf on the forgotten data Df. Simultaneously, to preserve recommendation performance, we update the unlearned model Mu to produce distributions similar to the remembering teacher Mr on the retained data Dr. The whole process can be formulated as:
minθKL(Mf(Df)Mu(Df;θ)),minθKL(Mr(Dr)Mu(Dr;θ)),
where KL() is the KL divergence between the output probability distributions of the teacher and unlearned model.
The forgetting teacher should have never seen the forgotten data. The retrained model, which refers to the model trained from scratch without observing Df, seems to be a suitable forgetting teacher. However, it is inefficient and not viable in practice. Here, we use an approximate method to design the forgetting teacher.
As shown in Fig.2(a), we first finetune an augmented model on the forgotten data Df. The augmented model with additional training on Df will output logits that are more relevant to Df. Therefore, the difference between the logits of the augmented and the original model represents the information related to the forgotten data Df. We denote the logits of the augmented model and the original model as vaug and v respectively, so the difference is vaugv. Subtracting this difference from the logits of original model can obtain the logits vf, which exclude the Df information. The formula of logits vf is as follows:
vf=vαReLU(vaugv),
where α is a positive hyper-parameter. Then, the output probability distribution of the forgetting teacher is the normalized vf, defined as Softmax(vf).
So far, we have acquired the forgetting teacher’s outputs. Then the forgetting loss can be formulated as:
LFGT=xfDfKL(Mf(xf)Mu(xf;θ)).
Simply forgetting will hurt the model’s recommendation performance. To retain the original recommendation ability, we encourage the unlearned model to “stay close” to the remembering teacher on retain data. We choose the original model as the remembering teacher Mr, because it has the best recommendation performance. Besides, to further strengthen the knowledge related to the recommendation task, we also add the prediction loss from Eq. (1). Formally, the remembering loss is:
LREM=Lpred(Dr;θ)+xrDrKL(Mr(xr)Mu(xr;θ)).
Finally, the loss of E2URec is the weighted sum of forgetting loss and remembering loss controlled by the hyper-parameter β:
L=βLFGT+(1β)LREM.

4 Experiments

We conduct experiments on two public recommendation datasets: MovieLen-1M (ML-1M) and GoodReads (GD). Both datasets are split into training, validation and testing sets with a ratio of 6:2:2 according to the global timestamp. In the experiment, 20% randomly chosen users would request to remove their training data. We use T5-base [1] as the LLM backbone. We set α=2 and β=0.6. Our method only needs to update 0.7% of the total parameters. The code is available.
We compare our E2URec with the state-of-the-art methods: Original: the original model without unlearning. Retrain: the model retrained from scratch without the forgotten data. We include it as a gold standard. SISA [2]: Sharded, Isolated, Sliced and Aggregated training. RecEraser [3]: improves SISA by collaborative sharding and aggregation. NegKL: uses KL loss to finetune the original model both on the retained and forgotten data, negating the KL loss for the latter. NegGrad [4]: uses prediction loss to finetune the original model both on the retained and forgotten data, negating the gradient for the latter. Bad-T [5]: use prediction loss to finetune the original model both on the retained and forgotten data, randomly assigning arbitrary labels for the latter.
We use the following metrics for analysis. 1) AUC, ACC and LogLoss (LL) on test set: measure the recommendation performance of the unlearned model. 2) JS-Divergence (JSD) and L2-norm on the forgotten data: JSD and L2-norm between the outputs of the unlearned and retrained model measure the effectiveness of unlearning. Smaller the metrics, better the unlearning. 3) Unlearning Time and the number of Trainable Parameters (#Params): measure the efficiency of unlearning method.

4.1 Results analysis

We list the comparison results in Tab.1 and Tab.2. From the results on two datasets, we observe that: 1) our method E2URec can better maintain the recommendation performance. E2URec achieves better AUC, ACC and LogLoss compared to other baselines. This is because E2URec minimizes the KL distance between the forgetting teacher and unlearned model to remove knowledge, instead of reversing gradients as in previous methods, thereby preserving model performance. 2) The prediction distributions of our unlearned model on forgotten data closely align with the retrained model, evidenced by the smallest JSD and L2-norm. This indicates that E2URec achieves the best unlearning effect due to our innovative forgetting teacher design, which only requires to modify the model’s output to mimic the retrained model. 3) E2URec attains superior unlearning efficiency compared to other methods. E2URec has the lowest time cost and #Params since it only updates lightweight LoRA parameters instead of all model parameters.
Tab.1 All metrics comparison results (in %) on ML-1M. The best results (except for original and retrain) are in bold
Metrics Effectiveness Efficiency
AUC ACC LL JSD L2-norm Time (s) #Params
Original 77.44 70.60 56.78 9048 2.2×108
Retrain 76.85 69.98 57.35 5279 2.2×108
SISA 75.35 68.52 58.89 2.05 9.82 3042 8.9×108
RecEraser 75.59 68.84 58.86 2.03 9.64 4009 8.9×108
NegKL 75.65 69.19 59.34 3.67 12.47 1805 2.2×108
NegGrad 75.97 69.31 59.20 4.64 14.13 1940 2.2×108
Bad-T 75.61 69.41 58.83 4.35 14.95 1684 2.2×108
E2URec 76.34 69.76 57.75 1.91 9.51 941 1.7×106
Tab.2 All metrics comparison results (in %) on GD. The best results (except for original and retrain) are in bold
Metrics Effectiveness Efficiency
AUC ACC LL JSD L2-norm Time (s) #Params
Original 73.52 70.67 55.46 9152 2.2×108
Retrain 73.39 70.53 55.56 5448 2.2×108
SISA 72.19 70.07 56.76 2.04 8.85 3008 8.9×108
RecEraser 72.29 69.93 56.57 1.66 7.71 3208 8.9×108
NegKL 72.88 70.15 56.38 2.02 10.01 1866 2.2×108
NegGrad 72.85 70.26 57.21 2.56 10.44 1608 2.2×108
Bad-T 72.75 70.13 61.43 8.02 19.09 1753 2.2×108
E2URec 73.41 70.42 55.48 0.90 6.54 800 1.7×106
We also conduct the ablation study to explore the contribution of each loss. In Tab.3, “w/o LFGT” and “w/o LREM” respectively represent removing LFGT and LREM. We observe that removing LFGT would increase JSD significantly, indicating that LFGT is the main factor to forget the data. Removing LREM would result in a notable drop in AUC, suggesting that LREM is essential to maintain recommendation performance.
Tab.3 Ablation results (in %). The best results are in bold
Variants MovieLens-1M GoodReads
AUC JSD AUC JSD
E2URec 76.34 1.91 73.41 0.90
w/o LFGT 76.25 2.62 73.40 1.25
w/o LREM 75.75 2.27 72.98 0.99

5 Conclusion

In this letter, we propose E2URec, the efficient and effective unlearning method for LLMRec. Our method enables LLMRec to efficiently forget the specific data by only updating the lightweight LoRA modules. Besides, to enhance the effectiveness, our method develop two teacher models to instruct the unlearned model to forget information without harming the recommendation performance. Extensive experiments show that E2URec outperforms state-of-the-art baselines on two real-world datasets.

References

[1]
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21( 1): 140
[2]
Bourtoule L, Chandrasekaran V, Choquette-Choo C A, Jia H, Travers A, Zhang B, Lie D, Papernot N. Machine unlearning. In: Proceedings of 2021 IEEE Symposium on Security and Privacy. 2021, 141−159
[3]
Chen C, Sun F, Zhang M, Ding B. Recommendation unlearning. In: Proceedings of the ACM Web Conference 2022. 2022, 2768−2777
[4]
Golatkar A, Achille A, Soatto S. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9301−9309
[5]
Chundawat V S, Tarun A K, Mandal M, Kankanhalli M. Can bad teaching induce forgetting? Unlearning in deep networks using an incompetent teacher. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 7210−7217
[6]
Chen J A, Yang D Y. Unlearn what you want to forget: Efficient unlearning for LLMs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 12041−12052

Acknowledgements

The SJTU team was supported by the National Natural Science Foundation of China (Grant No. 62177033). The work was sponsored by the Huawei Innovation Research Program. We thank MindSpore for the partial support of this work, which is a new deep learning computing framework.

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(406 KB)

463

Accesses

1

Citations

Detail

Sections
Recommended

/