Preserving User Privacy in Retrieval Augmented Generation: A Novel Approach Using Local Placeholder Tagging

Nguyen, Xuan Thang; Nguyen, Thanh Vinh; Nguyen, Thuy Duong; Hoang, Tran Huy Son; Nguyen, Gia Bao; Nguyen, Thị Ngoc Thao

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/6197

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Xuan Thang	-
dc.contributor.author	Nguyen, Thanh Vinh	-
dc.contributor.author	Nguyen, Thuy Duong	-
dc.contributor.author	Hoang, Tran Huy Son	-
dc.contributor.author	Nguyen, Gia Bao	-
dc.contributor.author	Nguyen, Thị Ngoc Thao	-
dc.date.accessioned	2026-01-19T09:37:31Z	-
dc.date.available	2026-01-19T09:37:31Z	-
dc.date.issued	2026-01	-
dc.identifier.isbn	978-3-032-00971-5 (p)	-
dc.identifier.isbn	978-3-032-00972-2 (e)	-
dc.identifier.uri	https://doi.org/10.1007/978-3-032-00972-2_38	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/6197	-
dc.description	Lecture Notes in Networks and Systems (LNNS,volume 1581); The 14th Conference on Information Technology and Its Applications (CITA 2025) ; pp: 519-531	vi_VN
dc.description.abstract	Retrieval Augmented Generation (RAG) is a popular approach that enhances the accuracy of Large Language Models (LLMs) by leveraging a knowledge base. It is rapidly becoming integral tools across various applications. However, as the use of RAG continues to expand, so do the challenges associated with their deployment, particularly in terms of data privacy. As a part of RAG pipeline, user query and all retrieved documents should be sent as a prompt to the LLM providers, leaving them open to privacy hazards such data leaks or illegal access. This study presents RLPT, a framework designed to enhance user privacy in RAG. It achieves this by identifying and eliminating sensitive information from user inputs before sending them to the LLM. The RLPT framework utilizes a local LLM to rapidly identify sensitive information in user input and subsequently replaces it with distinctive placeholders. These placeholders are used to indicate and hide the actual sensitive data, ensuring that the LLM does not capture the original sensitive information during prompt processing. The framework is evaluated using a dataset consisting of 4000 synthesized context documents. The results indicate that it is capable of accurately detecting and filtering privacy and sensitive information, achieving a high accuracy rate of 88,7%.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Springer Nature	vi_VN
dc.subject	Retrieval-augmented generation	vi_VN
dc.subject	Large language model	vi_VN
dc.subject	Privacy protections	vi_VN
dc.subject	Data anonymization	vi_VN
dc.title	Preserving User Privacy in Retrieval Augmented Generation: A Novel Approach Using Local Placeholder Tagging	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	CITA 2025 (International)

Files in This Item:

Sign in to read

Show simple item record