As organizations are harnessing the power of GenAI, we introduced Eddy, our AI assistant in October 2023, way back even before many organizations were exploring what is GenAI.
As a Knowledge base platform, we focused on developing an AI assistant to help our customers provide tailored responses to their customers. This has revolutionized how customers interact with documentation sites. Customers are now able to self-serve better utilizing Eddy’s accurate response. The accuracy of Eddy’s response is augmented by citations of source articles.
Given the nature of GenAI which is prone to hallucinations, we have limited the scope of Eddy’s knowledge source to our customer’s knowledge base. We have also fine-tuned our system prompts to generate “I do not know” responses if Eddy is not confident enough to answer the customer’s questions. This has reduced the hallucinations greatly and increased reliability. Even though we addressed hallucinations well, we will look into how we security-hardened Eddy’s capabilities to protect against prompt hacking attacks.
Large Language Model Security
Security of the Large Language Model (LLM) must be addressed when deploying any GenAI capabilities. Since the GenAI is based on a probabilistic model, it does not provide a consistent response. Moreover, the input and command are included in the same prompt, making them susceptible to hacking. The data can be disguised as command thus GenAI may be prone to generate misleading information or lying. This may cause huge reputational damage if GenAI’s responses violate the company’s legal terms. Thus, it is very important to address LLM vulnerabilities while building GenAI capabilities.
Type of Vulnerability
Prompt Hacking
This is a technique whereby carefully crafted prompts can deceive your GenAI tool into performing unintended actions. This includes generating misleading information, saying outrageous things, generating inappropriate content, and answering queries out of context. We have adopted red teaming best practices to mitigate the risks of prompt hacking. Our defense against prompt hacking includes
- Using state-of-the-art latest LLM models from leading LLM vendors
- Strengthening our system prompts so that prompt hacking is mitigated
- Providing guard rails for GenAI behavior
Prompt Leaking
Prompt leaking is a form of prompt hacking in which the GenAI tool reveals its system prompt. This is crucial for Eddy so that Eddy never reveals our system prompt, as it is the intellectual property of Document360. Also, exposing system prompts enables hackers to gain insight into how Eddy works and is prone to further hacking. We have utilized state-of-the-art LLM to prevent prompt leaking. We have robust instructions inside our system prompt that protect against LLM loopholes.
Malicious Intent
Identifying potentially harmful customer queries is very important. This helps us not to feed any malicious intent content to Eddy AI. We utilize the moderation endpoint from OpenAI to check whether a customer’s query or images are potentially harmful. If the moderation API flags any customer query to be harmful, then take some corrective action including
- Displaying a message to the customer saying “We cannot generate a response given the nature and intent of the question”
- Flag this query and internally store it for auditing and compliance reasons by GDPR and privacy policies
Also, Read: 7 Free AI Prompt Templates to Experiment With
Red Teaming
At Document360, we adopt a red teaming approach to mitigate risk arising from GenAI capabilities. This includes following principles of responsible Artificial Intelligence (AI) such as fairness, reliability & safety, privacy & security, inclusiveness, transparency, and accountability. We undertake periodic exercises to test LLM to determine gaps in safety measures. During this exercise, we try to identify any security vulnerabilities in our implementation. If it exists, then we take corrective action as part of the mitigation strategy. Moreover, we are part of red teaming communities across the globe so we are proactively looking out for zero-day exploits and emerging threats.
Closing Remarks
It is indispensable to test GenAI tools before deployment for security vulnerabilities. As GenAI practices and technology are still evolving, it is critical to keep your best practices updated to protect against new threats. The LLMs are inherently prone to hallucination and guardrails are very important to prevent LLMs from providing misleading information to our customers. Eddy AI is widely used amongst our customers so we need to maintain trust with our customers. We have deployed state-of-the-art LLM and have robust system messages to strengthen our value offering for Eddy AI.
An intuitive AI-powered knowledge base software to easily add your content and integrate it with any application. Give Document360 a try!
GET STARTED