There is a newly identified risk that creators of open-source LLMs can extract fine-tuning data from downstream models through backdoor training, even with black-box access.

This paper talks about a hidden danger where people who make open-source language models can secretly steal the special data you use to fine-tune their models, even if you only use the model from the outside and never share your data directly.

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract