1 Introduction
Transformer-based LLMs acquire rich language knowledge and patterns from large-scale corpus, endowing the models with understanding and generation capabilities in the semantic and syntactic aspects of natural language. Nowadays, LLMs have garnered widespread attention, with the release of commercial models such as OpenAI GPT, Meta OPT, and Google T5, showcasing their versatile applications and inference capabilities. Concurrently, the release of ChatGPT has triggered an LLM wave, with numerous open-source models such as LLaMA, MOSS, and ChatGLM emerging one after another in just a few months which has greatly expanded the possibilities for practical implementation.
Unfortunately, LLMs are suffering from privacy risks [
1,
2]. On the one hand, training corpus and user prompts are facing leakage challenges. Recently, research [
3] reveals that attackers can extract GB-level training data from popular LLMs with an adequate budget. According to the Cyberhaven report [
4], incidents involving sensitive information flowing into ChatGPT increased by 60.4% among 100,000 employees in an average week after the release of GPT4. The extracted data includes source code, project plans, personally identifiable information (PII), etc., with internal company information being the most prevalent. On the other hand, attackers can exploit LLM capabilities to conduct illegal behaviors and violate privacy [
5–
7]. In addition, LLMs face multiple challenges in real-world applications due to their excellent versatility, including inter-model invocation and multi-modal data, intensifying privacy concerns. In this context, how to protect privacy in LLM construction and utilization has become an urgent problem to be solved.
Recently, some surveys have explored various LLM topics, such as hallucination [
8] and memorization [
9], most of which involve LLM security discussion. Recent literatures [
10,
11] have delved into LLM security, but overlooking specific attention to privacy, where privacy is a part of security traditionally. Specifically, works [
12,
13] approach the security, authenticity, and controllability of LLMs from the generated content perspective, discussing privacy concerns related to training data leakage. Whereas, some surveys [
14,
15] concentrate on threats, attacks, and defense aspects of LLMs, involving discussion of privacy protection techniques, and privacy assessment. Furthermore, study [
16] deals with privacy analysis on the security plugin-integrated language platforms from the LLM application perspective. It is noteworthy that privacy definitions vary across different domains (including images, text, and others) and encompass various manifestations such as PII, trade secrets. Despite these variations, commonality exists in the fundamental privacy issues and protection methods.
In our work, by using keywords related to “privacy” and “LLM” to search and collect papers in academic databases, we focus on the privacy investigation of natural language text in LLMs. Tab.1 shows the results of our review compared with existing surveys. By reviewing LLM privacy concerns and prevalent research, along with analyzing reasearch focus in LLM application, we find that the current LLM privacy research primarily resides in the technical exploration phase, and there exists a certain gap from practical application.
The main contributions of our work are as follows:
● We conduct an in-depth investigation into privacy issues within LLMs, exploring the latest research advancements. According to our research, this paper represents the first comprehensive exploration of LLM privacy issues related to text-sensitive information.
● We provide a detailed analysis of five privacy issues and solutions in LLM training and invocation, as well as delving into three privacy-centric research focuses in LLM application that are not mentioned previously.
● We point out five potential research directions in LLM privacy aspects and propose three innovative prospects for LLM native security mechanisms.
The rest of this paper is organized as follows. Section 2 briefly reviews the progress of LLM. Section 3 discusses privacy issues at different stages and analyzes current research progress. In Section 4, we focus on privacy concerns in LLM application. Finally, we provide analysis and outlooks in Section 5, as well as the conclusion in Section 6.
2 Background
2.1 The development of LLMs
In 2018, the emergence of pre-trained models such as GPT and BERT introduce a new paradigm of “pre-training + fine-tuning” that combines unsupervised pre-training with supervised fine-tuning, aiming to learn a generalized representation that can transfer across various tasks. Recently, researchers [
21,
22] have discovered that by increasing the parameters and data size of pre-trained language models, LLMs not only significantly improve performance but also demonstrate advanced capabilities such as text generation and content understanding that are not available in the small models. Furthermore, the introduction of industry-specific LLMs [
23,
24] based on domain-specific data further propels the application and development of LLMs.
LLMs require massive pre-training corpora to acquire general language capabilities that demonstrate powerful learning and reasoning abilities after fine-tuning datasets from various domains. However, existing corpora may contain information posing privacy risks, such as personal information in publicly available corpora or business-sensitive information in fine-tuning datasets. The robust memory and inference capabilities may lead to leakage of training data and prompt, enabling the inference of sensitive information and causing potential losses. Meanwhile, attackers [
5–
7] can exploit the capabilities of LLMs to access and infer private information for illegal activities, resulting in economic losses. Furthermore, LLMs face multiple challenges in real-world applications due to their excellent versatility, including inter-model invocation and multi-modal data, intensifying privacy concerns. Against this backdrop, effectively protecting sensitive information throughout the LLM lifecycle has become an urgent issue.
2.2 The fundamental processes of LLMs
Typically, LLMs mainly consist of two stages: training and invocation, as shown in Fig.1. In the training stage, the model undergoes three steps. Initially, LLM is pre-trained using a large unsupervised corpus to obtain the foundational language understanding (Step 1). Subsequently, techniques such as low-rank adaptation [
25], soft prompt tuning [
26], and in-context learning [
27] are used to fine-tune the specific dataset to adapt LLM to a specific task (Step 2). To ensure the model’s robustness in ethical constraints and security, previous studies have used reinforcement learning from human feedback (RLHF) [
28–
30] to achieve alignment optimization (Step 3). For invocation, users typically interact with LLMs through remote API calls, uploading the constructed prompts to the platforms (Step 4) and receiving the model-generated answers (Step 5). Notably, the specific training data and model parameters are not visible to the users. In our work, we deeply analyse five privacy issues in this steps. Specifically, training data leakage and training data erasure are primarily associated with the training data during step 1 and step 2, while privacy assessment focuses on the augmented model in step 3. Prompt leakage is centered around the user prompts upload process during step 4, and controllable outputs pertain to the responses obtained in step 5.
LLMs are widely used in applications such as intelligent question answering, text generation, and text analysis. However, with the increasing research and numerous LLM privacy incidents, LLM’s powerful capabilities and new working paradigms invoke concerns on privacy.
3 Privacy issues and solutions
By collecting 90 related literatures, we point out the privacy scope in current research (Section 3.1) and find that current LLM privacy research mainly focuses on three aspects during training: training data leakage (Section 3.2), training data erausure (Section 3.3), and privacy assessment (Section 3.4). For invocation, the research primarily addresses two issues: prompt leakage (Section 3.5) and output controllability (Section 3.6). Fig.1 provides a visual representation of specific phases where these issues occur.
3.1 Privacy scope in literatures
Privacy is naturally understood as the dynamic sharing of sensitive information based on context [
17]. In certain situations, sharing sensitive information with specific individuals or entities may be appropriate, while in other contexts, such sharing could constitute a privacy violation. When it comes to LLM privacy, the research on LLM privacy does not pay much attention to the privacy scope, but more on defining the privacy scope in some fields (such as finance, medical care, and personal) at the conceptual level. Some studies [
5,
31,
32] focus on defining privacy through datasets, treating PII in public datasets as sensitive information or adopting data synthesis techniques to generate simulated data [
33,
34], which concentrate on specific types of privacy. Meanwhile, with the development of regulations and LLMs, some works [
35–
37] shift towards automatically interpreting privacy regulations to clarify privacy boundaries, which is elaborated in Section 4.1. However, there is still a gap between the definitions of privacy boundaries and practical applications. We will discuss this further in the context of five specific issues and provide an insight in Section 5.3.
3.2 Training data leakage
LLM training data can be extracted from the LLM-generated content [
38,
39], posing leakage risks. Specifically, training data consists of self-supervised corpora and fine-tuned datasets. Corpora are typically constructed from publicly available datasets such as Wikipedia, books, journals. Despite the absence of explicit sensitive information for a specific person whose PII data was not considered in the training set, recent studies [
5,
40] show that pre-trained models may still infer PII. Moreover, fine-tuned datasets usually focus on task-specific scenarios containing sensitive information such as business data, customer information, etc. The robust memory and association capabilities of LLM are responsible for training data leakage. On the one hand, LLM efficiently memorizes training data [
7], and the larger the model, the easier it is to retain the training data. On the other hand, LLM exhibits outstanding inference capabilities that rapidly enhance their association accuracy when presented with few-shot settings [
40], thereby increasing the risk of data leakage.
Generally, protection techniques can be divided into two categories: text scrubbing and privacy-preserving algorithms. (1) Text scrubbing aims to filter, mask, or delete sensitive information in text, which is applicable for data preprocessing and model-generated content filtering, as shown in Fig.2. Works [
41,
42] deduplicate the training data based on matching algorithms, focusing on the model perplexity and the model memorization after data deletion. However, they do not fully consider the characteristics of the deleted data itself and assess the potential damage of de-emphasis on the model’s positive memory capacity (e.g., question answers). Studies [
33,
38] use annotation tools to detect PII but simply removing or masking sensitive information may lead to semantic loss [
43]. In addition, some works [
33,
34,
44] use differential privacy to achieve privacy data synthesis, but the model-generated content suffers from length truncation, and the synthesized data’s quality needs to be improved. (2) Privacy-preserving algorithms provide privacy guarantees during training. Works [
45–
47] introduce noise during fine-tuning to protect fine-tuned datasets, yet require predefined sensitive information boundaries and the added noise affects model performance. Another approach [
29] uses RLHF for privacy preservation, but introduces additional computational complexity.
Through the analysis, current research mainly focuses on handling sensitive information with well-formatted definitions as shown in Tab.2, and it faces the following challenges. First, sensitive information may be referred to and written indirectly, which may manifest in acronyms or pronouns across different paragraphs. Notably, it is difficult to reliably recognize with current technologies and manual detection and filtering [
64]. Second, protecting a specific unit of data is not equivalent to achieving privacy preservation. Most existing research protects specific collections containing limited sensitive information (e.g., PII [
33,
38,
46]) but cannot ensure the entire collection’s privacy. Third, existing techniques [
38] emphasize privacy preservation while facing challenges in terms of model usability.
3.3 Training data erasure
The right to be forgotten originating from regulations refers to the citizen’s right to ask data processors to delete their personal information [
18]. Some researchers [
23,
65,
66] believe that the need to forget sensitive information within models is consistent with it. Consequently, LLM providers should follow user’s needs to update and delete their information in training data on time, which means LLMs need to possess the ability to forget specific training instances. However, it faces the challenges of high costs and complex situations regarding the data to be deleted. Due to LLM’s high computational cost and long training time, retraining the model from scratch cannot meet the iteration deadlines specified in the regulation. Furthermore, data diversity and uncertain form make it difficult for the protection techniques mentioned in Section 3.2 to ensure deletion integrity.
Recent research pay attention to machine unlearning that aim to achieve a more cost-efficient forgetting process. Tab.3 presents detailed research on machine unlearning methods for protecting the privacy of text-sensitive information. Usually, it can be categorized based on the accuracy requirements for the forgetting process into exact unlearning and approximate unlearning. Exact unlearning [
67,
80] speeds up the retraining process by partitioning the training dataset and removing exact data points from the model to ensure the particular training instance eradication completely. Yet, it requires access to the entire dataset during training and forgetting, and the model’s fairness may be sacrificed [
81]. In contrast, approximate unlearning, allows for some error or residual information in the forgetting process, sacrificing accuracy for more cost-effective forgetting. It mainly includes three types. (1) Parameter optimization: research [
65,
70,
71] modify the parameters in the model to alter the training objective, enabling the model to avoid retraining. Without requirements for access to the entire dataset, this approach is more suitable for model providers to maintain during training, but there is a demand for model computational cost. (2) Parameter sharing: study [
77] uses multiple models for parameter fusion, revealing that forgetting occurs when data is not shared among models, promoting the forgetting of knowledge about non-shared data among models. This method requires explicit partitioning of the dataset during the training phase, which is suitable for scenarios where relatively few data need to be deleted, especially during the early stages of model training. (3) Contextual learning: study [
79] focuses on the LLM inference capabilities. By providing inverted labels and additional correctly labeled instances as inputs during inference, knowledge forgetting can be achieved without requiring parameter changes.
An exploration of current studies indicates that unlearning algorithms can effectively reduce data deletion time, but most works [
77,
79] mainly pay attention to analyzing the algorithm’s impact on model performance, while ignoring other key factors such as fairness, transparency, and privacy. Recent study [
66] shows that neighboring data points are more likely to suffer from privacy leakage after the application of unlearning algorithms. Therefore, the robustness assessment by forgetting learning algorithms deserves attention.
3.4 Privacy assessment
As LLMs use training data from the internet and existing research pay limited attention to the quality and credibility of online resources [
82], it is crucial to evaluate privacy risks before deploying the model in applications. Assessment includes both model privacy and usability.
Recent studies evaluate language models such as GPT2 through adversarial attacks, including membership inference [
32], data reconstruction [
38], data extraction [
31], and attribute inference [
6]. Surprisingly, researchers [
83,
84] identify privacy risks in parameter-efficient fine-tuning techniques for LLMs, where in-context learning is vulnerable to membership inference attacks and low-rank adaptation are highly sensitive to backdoor attacks. Though recent models have implemented security mechanisms like RLHF [
30] to enhance security, LLMs maintain a black-box nature, especially with limited disclosure of defense mechanism, which poses challenges to the assessment. Some studies [
39,
85] fine-tune the models by constructing malicious datasets to induce the models to output sensitive information. They find that the RLHF mechanisms do not make the LLMs forget the sensitive information learned in the pre-training phase. Instead, these mechanisms reduce the likelihood of the model-generating content related to this information, which is still retained in the parameters. In addition, some work [
86] analyzes existing models through prompt attacks and finds that current defense mechanisms have been quite effective in rejecting direct prompt queries. Nonetheless, sensitive information exposure persists when constructing complex prompts (e.g., thought chains [
86], reward-feedback mechanisms [
87]), or combining prompts with external information [
5]. Moreover, LLMs are suffering from hallucinations posing a challenge for privacy assessment [
18]. Model hallucination refers to the possibility that model inferences may lack factual basis, resulting in content generation that deviates from reality which could potentially impact the privacy assessments.
By delving into the investigation, current study focuses on malicious changes to the training dataset during fine-tuning or prompt injection for privacy assessment. Nevertheless, (1) privacy assessment requires multi-dimensional sensitive information evaluation. Existing works concentrate on analyzing token-level sensitive information as shown in Tab.4, but sensitive information manifests in various forms such as documents in real-world scenarios. Given the diversity of sensitive information and variations across different scenarios, a single evaluation index makes it difficult to comprehensively judge the model’s ability. (2) A more comprehensive evaluation is required to analyze the privacy implications of parameter-efficient fine-tuning techniques which are becoming widely adopted for LLMs. Current evaluation [
83] predominantly uses traditional membership inference attacks and assesses only three fine-tuning methods, requiring it to broaden both the evaluation methods and the range of fine-tuning techniques assessed. (3) Whether the effect of privacy risk assessment is affected by hallucinations [
9] needs to be investigated because the training data is not publicly available. As represented in Tab.4, existing works use model perplexity and accuracy to evaluate model memorization, without considering the hallucination impacts in privacy assessment. Therefore, the multi-granularity sensitive information evaluation index and hallucination-privacy impact need to be further explored.
3.5 Prompt leakage
LLMs typically employ remote service APIs in practice, where users provide prompts to obtain responses. Due to the potential sensitive information inclusion in prompts [
1], and the fact that those with access to the prompts can imitate behavior to generate similar content, prompt leakage has become a significant concern. While LLMs present a black-box nature to users, they remain visible to providers and developers. Despite recent provider statements indicating the exclusion of unauthorized API client data from training data, this declaration does not extend to instances of publicly invoked products, thus still posing potential risks of privacy leakage. Furthermore, attackers may infer previous prompts by contextually deducing historical records. Researchers [
92,
93] have successfully inferred user-provided prompts with high accuracy (over 75%) using methods such as interactive extraction and membership inference attacks, and the precision of attacks is expected to continue increasing with an increase in interaction frequency.
Current approaches [
93–
95] highlight employing locally deployed differential privacy methods to transform data into unreadable token representations before user upload without adjusting LLM parameters. However, attackers can leverage data reconstruction techniques [
96,
97] to recover prompts from token representations. Other strategies [
98–
101] attempt to enhance privacy preservation by locally token replacing, using text-scrubbing techniques to replace sensitive information before invoking the LLM and locally store sensitive information mappings. After generating content, sensitive information is added to the response based on the locally stored mapping.
Reviewing shows that current studies primarily center on user-side preprocessing and safeguarding to prevent prompt exposure to providers and developers. Nonetheless, (1) Current localized differential privacy methods face challenges posed by data reconstruction attacks. Prior studies have utilized methods such as token reduction [
102,
103] or token merging [
104] to reduce or merge irrelevant token representations, preventing attackers from conducting data reconstruction. However, limited research has investigated the practical effectiveness of these methods against data reconstruction attacks within the context of existing LLM applications. (2) Token replacing methods are only applicable to specific downstream tasks. Existing research [
93,
95,
103] currently only addresses token-level sensitive information. In tasks relying on the precise semantics of replaced entities (e.g., question answering, text generation), where accurate semantic understanding is crucial, current methods may not offer sufficient guarantees. Hence, defense against prompt reconstruction attacks in LLM applications and prompt protection methods suitable for different task dimensions need to be further explored.
3.6 Controllable outputs
The uncontrolled LLM outputs raises potential issues and introduces privacy concerns. On one hand, model-generated content may contain sensitive information or be maliciously exploited [
2,
105] to violate privacy. On the other hand, model-generated content may be restricted to dissemination within limited closed domains in real-world scenarios, necessitating the tracking and identification of channels and usage. Hence, it is essential to ensure the integrity and traceability of model-generated content.
Recent work [
106–
109] have focused on detecting and filtering privacy risks in model-generated content to ensure its safety. For instance, the work [
106] uses a self-inference approach based on LLM security alignment capabilities to ensure safe model-generated content. Moreover, some approach relies on third-party LLMs, utilizing predefined templates [
107] and random masking method [
108] to identify potential privacy violations in user instructions and model-generated content. However, these methods, which depend on LLMs and prompt templates, face scalability challenges in real-world applications. Notably, the open-source project LLM-guard [
110], integrated with the Presidio analyzer [
111], is designed to accurately detect and filter PII in LLM outputs. Meta’s Llama Guard [
109] classifies the violation risks of LLM-generated content, including the assessment of privacy risk levels. Study [
112], nevertheless, highlights the insufficient effectiveness of current safety content mechanisms against malicious inquiries. Overall, the safety detection of model-generated content is currently challenged by limited scalability and insufficient effectiveness.
Furthermore, the spotlight in recent research has been on the traceability of model-generated content. Watermarks have been a key focus which aims to embed undetectable yet reversible marks within LLM outputs to identify the meta-model and prevent the misuse. Recent research [
113,
114] applies watermarking to LLMs, ensuring minimal impact on text quality when access to the original model parameters is not available. However, existing research [
115] indicates that watermarks in LLMs can be quickly detected by simple classifiers, and the presence of the watermark may decrease the length and coherence of the generated content. Other studies focus on distinguishing model-generated content to achieve privacy traceability. Some studies [
116–
118] attempt distinction through fine-tuning LLM, but they suffer from overfitting on training, resulting in significant performance degradation when faced with unknown LLMs and cross-domain/unseen data [
119]. To address this problem, research [
120] introduces contrastive learning to enhance distinction performance. Furthermore, some works attempt to use LLMs to identify LLM-generated content. However, research [
121,
122] suggest that newer LLMs like GPT-4 cannot reliably identify various types of LLM-generated content directly. Recent studies [
123,
124] leverage LLM’s contextual learning to improve the detection effectiveness.
Nevertheless, content traceability faces several challenges. Firstly, the practical use of watermarking faces challenges in striking a balance between the generated content detectability, the watermark effectiveness, and the generated content quality. Secondly, model-generated content distinction should be treated as a multi-classification problem rather than binary classification (as shown in Tab.5) to ensure traceability for privacy. Existing research in content distinction has mainly focused on distinguishing few LLMs. Given the proliferation of current LLMs, distinction needs to progress beyond mere identification of whether the content is generated by an LLM to more nuanced classification distinctions. Thirdly, there is a need for cross-language/domain content distinction capabilities to facilitate privacy traceability.
By conducting analysis, controllable LLM outputs research is in its early technical exploration. Further studies are needed, particularly in areas such as enhancing the effectiveness and scalability in security detection for generated content, balancing watermark robustness/effectiveness and content quality, refining content distinction classification and improving the generalizability across languages/domains.
To sum up, we find that many studies focus on LLM privacy, primarily exploring existing models or early technologies within LLMs. Despite notable progress, various unknown aspects in this field deserve further investigation. Essentially, LLM privacy research remains in the exploration phase.
4 Privacy research focus in LLM application
In general, LLM involves three implementation modes in the application, including single usage, distributed construction, and model collaboration, as shown in Fig.3. Within these modes, previous research emphasizes the utilization of LLM for privacy compliance, serving as a basis for subsequent privacy protection (Section 4.1). Furthermore, some studies concentrate on the secure data sharing that may arise during distributed construction (Section 4.2). Finally, in collaborative mode, we delve into knowledge traceability and access control (Section 4.3).
4.1 Privacy compliance
A series of regulations have been introduced to protect user privacy rights, where privacy policies serve as the primary means to inform users about the collection, storage, and usage of their data. Indeed, privacy policy texts are often lengthy and contain numerous technical and legal jargon. Manual reading and compliance detection based on privacy policies consume a considerable amount of time, and the rapid iteration of privacy policies makes practical implementation challenging. Therefore, the pressing challenge is how to automate the process to efficiently understand privacy policies and implement compliance detection, depicted in Fig.3(a).
Lately, annotation [
125,
126] has been adopted as the main technique, aiming to utilize metadata to mark relevant portions of privacy policies or unravel information relationships within the policies. Notably, LLMs have been introduced for privacy policy analysis. PolicyGPT [
35] employs predefined segments and categories, querying GPT for classification to achieve automated classification. Furthermore, a study [
36] utilizes LLMs for segmenting policy fragments, improving segmentation effectiveness by 13.1% in the application, albeit with a corresponding fivefold increase in computational costs. Additionally, work [
37] proposes using LLM to execute automated GKC-CI parameter annotations for privacy policy analysis.
Analyzing reveals that current research typically partitions policies based on segment level with a predefined labeling structure. However, (1) current technologies exhibit biases in analyzing privacy policies. Segment-level labels may not allow for a complete analysis of privacy policies, as the information may be mentioned across segments. Besides, the analysis granularity may not align with practical requirements. PoliGraph [
126] suggests using knowledge graphs to describe privacy policy statements as relationships between fine-grained texts. However, knowledge graphs face challenges with unseen knowledge. It is worth investigating whether the LLM robust generalization can effectively address privacy relation descriptions. Additionally, (2) current privacy labels lack widely recognized, formal standards, where individual software applications employ different classification methods for the information involved. While LLM excels in zero-shot settings, more investigation is required to assess its efficacy in handling unbalanced data distribution of privacy labels. (3) There is a lack of compliant automated methods. While existing research has concentrated on privacy policy analysis, a critical link to compliance detection is yet to be established.
4.2 Secure data sharing
Model scale and training data size are crucial factors contributing to the LLM effectiveness [
21,
40]. General corpora lack the expertise needed for specific industry requirements, and high-quality datasets are dispersed among enterprises, isolated due to business competition and privacy concerns, as described in Fig.3(b). Additionally, small and medium-sized enterprises often struggle with large-scale model training costs due to limited computational resources and inadequate training data. While combining data from multiple sources holds potential benefits for LLM construction, current regulations restrict direct data sharing among isolated entities.
Dramatically, federated learning is a crucial technology for addressing data sharing. Recently, some efforts [
127,
128] have combined federated learning with LLM to achieve secure training data sharing, deploying local LLMs on clients during pre-training and fine-tuning to interact with the aggregation server for parameters. However, the distributed and multi-stage training nature of federated learning increases the risk of data poisoning attacks and parameter leakage. In addition, the memory in federated learning with LLM is more susceptible to privacy violations [
129]. Researcher [
130] designs a distributed framework based on parameter-efficient fine-tuning to enhance privacy. This framework keeps the LLM backbone on the server while delivering compressed adapter modules and simulators to clients, allowing for partial parameter sharing rather than sharing all parameters. However, the performance significantly degrades due to the lossy nature of the simulators. Other research [
131] integrates defenses such as secure aggregation, differential privacy, and multi-party secure computation into the design framework, but the effectiveness of their privacy protection has not been fully evaluated.
Through research exploration, federated learning provides new ideas for sharing data security in the LLM construction, but federated learning with LLMs are still in the technical exploration stage. First, defensive techniques for LLMs cannot be directly applied to federated learning scenarios. Data pre-processing techniques require access to local user data to filter sensitive information, increasing the protection complexity. Meanwhile, adversarial training requires substantial resources, which may be challenging for lightweight users. Second, LLM performance decreases significantly during partial model access. Since pre-trained LLMs have some proprietary value and may not belong to the client, it is necessary to allow the client to perform federal fine-tuning without accessing the full model. Research [
128] shows that when only 50% of the model is accessed, LLM retains little generation and inference capabilities. The defense mechanism for LLMs based on federated learning and the trade-off between model access control and model performance requires further in-depth research.
4.3 Privacy traceability and access control
LLM layering can be divided into two aspects: vertical stratification and horizontal collaboration, as shown in Fig.3(c). For vertical stratification, LLMs are often categorized into three levels: global, field, and user [
132], based on application purposes. The global level is tasked with training general corpus to serve as the foundation. The field level incorporates industry knowledge based on the global LLM to obtain a field LLM [
23,
24]. The user level involves downstream task adaptation according to specific domain requirements, resulting in user LLM. From a horizontal perspective, some studies [
133,
134] focus on collaborative LLMs, i.e., where multiple AI systems collaborate to solve complex workflows through negotiation and debate. Additionally, research [
135] finds that collaborative work effectively enhances LLM compliance with facts and improves decision-making capabilities.
However, LLM hierarchical collaboration faces some privacy challenges. Firstly, there is the issue of external knowledge traceability. Recent work [
136] integrates LLM with external knowledge links to improve real-time capabilities and introduce domain-specific knowledge. Researchers [
137] discover that retrieval-augmented generation, while somewhat effective in reducing LLM training data leakage, presents a privacy leakage risk for the data it retrieves. As discussed in Section 6, the traceability and integrity of external knowledge in the generated content need to be verified. Secondly, different users should have different access to data. Access control in model collaboration involves two typical issues. For inter-model invocation, different LLMs possess distinct knowledge. When invoked, they encounter issues regarding mutual access permissions and knowledge sharing between LLMs. For multi-user utilization of the same model, LLMs contain global data. However, due to varying access permissions among users, the generated content should be distinguished accordingly. Though prior work [
138] proposes access control instructions using self-moderation techniques in LLMs to allow users selective information output, it exhibits bias across different groups and is easy to bypass. Besides, we find limited public research on LLM access control. In hierarchical collaboration, it becomes even more complex as the final generated content involves the fusion of multiple LLMs. Currently, research on hierarchical design, knowledge and model fusion is still ongoing, and privacy issues such as external knowledge security and access control are worthy of attention.
In brief, we observe that the majority of existing privacy-related research in LLM application is still in the conceptual and preliminary experimental stages, indicating a considerable distance from practical application.
5 Analysis and outlooks
In this section, we analyze technology maturity and explore field trends (Section 5.1). Moreover, we explore further research directions (Section 5.2) and present unique insights into LLM native security mechanisms (Section 5.3), offering valuable references for LLM privacy development.
5.1 Technology maturity analysis
We believe that the current LLM privacy research is still in a technical exploration phase and there is a certain gap from practical application. As shown in Fig.4, we conduct a classification and statistical analysis of the surveyed papers based on privacy-related issues which are categorized into three stages depending on the methods, experiments, and deployment. For technical research, researchers propose techniques and use publicly available experimental datasets. The potential applications stage indicates studies testing on simulated data, demonstrating certain potential practical value, and the application stage shows that the research is practically deployed in commercial products. We find that the surveyed papers are mainly in the technical research stage, with relatively fewer in the potential and practical application stage, which shows somewhat that LLM privacy research is in the exploratory stage. Additionally, we notice a gradual increase in research during the potential applications stage, indicating a trend of transitioning from technical exploration to practical applications in LLM privacy research. Furthermore, research on training data leakage, controllable outputs, and privacy assessment has garnered widespread attention. In contrast, training data erasure is mainly the focus of the academic community, which may be due to the complex technique still far away from application.
5.2 Further research directions
From our collection and analysis, we propose potential directions for further research in LLM privacy:
Diverse scalability. Sensitive information manifests differently in various scenarios. It is necessary to effectively protect different granularities of sensitive information in different model stages. Current research [
139,
140] primarily focus on protecting well-defined sensitive information formats, as described in Section 3.2, Section 3.4, and Section 3.5. Future research could extend to methods for multi-form sensitive information protection, multi-granularity privacy assessment, and multi-dimensional task prompt protection.
Cross-domain generalizability. The current studies [
5,
38,
86] are primarily applicable to single-language scenarios and mainly focuses on PII during experiments, with limited involvement of real-world domain datasets. Future research could explore the applicability and generalizability of corresponding methods in multilingual and multidomain datasets.
Privacy-preserving robustness. Existing research [
46,
47] concentrate on the privacy-utility tradeoff. However, LLMs exhibit issues such as bias and hallucination. Future investigations can further analyze how privacy-preserving techniques influent LLM’s fairness, transparency, and hallucination. Additionally, exploring how to design effective privacy-preserving methods for emerging threats while maintaining model quality is crucial for building secure LLMs.
Collaborative security. LLM training involves a multi-party data-sharing phenomenon, and the demand for collaboration between LLMs and external entities, such as knowledge bases, is on the rise for accomplishing complex workflows. Hence, protecting privacy during collaboration becomes increasingly crucial. The security LLM collaboration research is in the early exploration stage (as shown in Section 4.3). Future studies can further focus on access management, external knowledge, and generated content traceability, as well as privacy-preserving techniques for collaboration.
Multimodal impact. Multimodal large language models (MLLMs) extend LLMs by incorporating the ability to process and reason across multiple data modalities. However, current privacy research in this area remains sparse [
141,
142]. In addition to the known LLM privacy issues, the introduction of multimodal data may disrupt alignment strategies within LLMs, potentially increasing the risk of sensitive information leakage. Therefore, further exploration is needed to assess the broader implications of multimodal inputs on the privacy framework within LLMs and to develop comprehensive solutions.
5.3 Native security insights
Beyond the in-depth exploration covered in current research directions, we propose several noteworthy prospects for LLM native security mechanisms.
Privacy boundary. Existing privacy boundary based on datasets only focus on well-defined sensitive information. In addition, current privacy labels lack widely accepted and formal standards, introducing biases due to labeling inconsistencies in practical applications. While our work focuses on privacy discussions about sensitive textual information, privacy varies across domains. Future research should explore methodologies and regulatory frameworks to precisely define the LLM privacy boundaries, providing support for subsequent privacy policy formulation and the implementation of privacy protection.
Data security isolation mechanism. Bengio [
143] presents a forward-looking perspective on decoupling knowledge and inference machines, advocating for the independent validation and maintenance of knowledge storage to enhance the verifiability and reliability of model knowledge. The adoption of layered model designs has demonstrated significant advantages in swiftly adapting to the dynamic demands of various industries. Traditionally, data security isolation mechanisms together with external protection strategies provide comprehensive privacy preservation. However, there is a noticeable absence of publicly available research on reliable isolation mechanisms for LLMs. Combined with the previous analysis, we believe that effectively isolating data and distilling sensitive information during LLM training, invocation, and application is an urgent and underexplored area that demands attention, guiding the direction in LLM privacy focuses such as access control and privacy traceability.
Self-verification measures and evaluation. Despite recent assurances from major LLM providers that unauthorized customer data is no longer used for training, the LLM’s inherent black-box nature and closed architecture make it difficult for the public to verify the statement’s authenticity. Hence, there is a pressing need for effective self-attestation methods and user-centric evaluation approaches, which are imperative for discerning whether user data has been employed in model training and for validating the compliance of the training data. To enhance transparency, effective self-verification methods and user-based assessments are urgently needed. Additionally, there is a pressing need for tools facilitating third-party assessment, auditing, and regulatory supervision to detect any unauthorized use of user data in LLM training and to validate the compliance of training data.
6 Conclusion
We extensively survey LLM privacy and review corresponding solutions. Specifically, we highlight five privacy issues during LLMs’ training and invocation, providing comprehensive insights into their current state and potential advancements. Additionally, we introduce and analyze three privacy-centric research focuses in LLM application. Finally, we discuss further research directions and provide insights into LLM native security mechanisms with our view: LLM privacy research is in the technical exploration phase. While we conduct a comprehensive investigation into the LLM privacy, we acknowledge potential omissions due to rapid updates in related research. Therefore, we commit to ongoing monitoring of new research and continuous refinement of our work. We hope this paper provides researchers and practitioners with a comprehensive understanding to better address the privacy challenges that LLMs may encounter in real-world applications.