We’ve talked about the many ways large language models (LLMs) and artificial intelligence (AI) are impacting business efficiency, data and analytics, and even FinOps. But we’ve yet to talk about arguably one of the most important areas of concern: security.
When GPT first launched, it initiated a shock wave. For the first time, AI became a buzzword for the masses—not just the tech teams and investors in Silicon Valley.
Transformer architecture: This neural net architecture is great at understanding language like humans do and applying context to it. Hardware improved enough over the years to support larger and larger models in this architecture. More importantly, it kept getting better as it was trained on more data at a given time and scale. This combination made GPT “smart” enough to interact with people in true natural language.
Instruction following: GPT can follow instructions given to it. Because businesses can program it using natural language and embed LLMs into their products, businesses started co-innovating and developing new products on top of LLMs rapidly.
Conversational AI: ChatGPT’s interface is as familiar as your cell phone’s text messaging thread. This ease of use is largely responsible for its mass adoption. And it’s free.
During the early days of GPT, Amit Prakash, ThoughtSpot’s Co-founder, said “I compare this to discovering a new element on the periodic table in the sense that it will enable a lot of new things that were not possible before, but only by carefully studying its properties and mixing it with other elements.” Since then, we’ve already witnessed a lot of change.
New source code is written and integrated at unprecedented speeds
More AI apps are being built, therefore apps are getting smarter
Users are more keen to try new AI features due to the ease of instructing in plain English or the language of your choice
Massive uptick in data generation—from LLMs and app logs that see increased activity
That data is then being used to improve the AI apps or underlying LLMs
The cycle described above introduces a number of security risks for businesses—whether you’re integrating LLMs into your product, or using a product that has already integrated an LLM into your workflow. The purpose of this article is to highlight those security risks and discuss strategies for risk mitigation.
1. My confidential data is exposed via LLM providers like OpenAI, Microsoft Azure, Google Cloud Platform, etc.
This is an obviously important security risk if you’re building products with LLMs. This exposure could take many forms including:
Your data is stored, exposing you to risk of data breach
They train models on your private or proprietary company data
For the last example, consider what would happen if the model trained on data that includes personal information, customer information, or PII. Or, perhaps they trained the model on private company data that is now offered as a service to your competitors, giving away proprietary information that your competitors can then use to build a competitive edge against your company.
2. My confidential data is exposed via AI business apps like Notion, Gmail or developer tools like Github etc.
While you may not think of these trusted workplace apps as posing significant risk, their use of AI may create new exposures. For example, by using any LLM provider to bring you new AI-powered features in their product, it could open your business up to any of the risks listed above.
Additionally, they may fine-tune models on your data which they would then make available as a service. In this case, your competitors may leverage it or your proprietary data, like metric definitions, may get leaked.
3. My restricted data is exposed via LLM-based apps to unauthorized users, because LLMs don’t follow instructions and are probabilistic
Your organization likely has multiple functions and levels of staff. The HR or people function has access to employee personal data that the rest of the organization does not. Your manager has access to your compensation package which your peers may not. This information is available in documents, stored in certain apps, or exchanged in the workplace email or messaging tool.
LLMs are being leveraged by apps to bring more efficiency at the workplace.For example, to power a Q&A internal tool, to author creative marketing content, to augment personalized email responses for sales, and to summarize meetings for product managers. In doing so, your organization is providing some level of access to the app that enables these AI-powered workflow enhancements.
LLMs don’t have a notion of access per token of data that it gets trained on. Some apps enforce this on a best-effort basis by instructing the LLM to respond to the user’s request only using a subset of data that the user is authorized to read. But, LLMs are known to disobey instructions or hallucinate. Other apps try to post-process the response, recognize restricted data, and verify that the user is authorized to it. However, it’s only reliable and secure when the response is well structured or restricted to a small sample space.
LLMs currently don’t have the ability to associate properties like license terms with tokens it predicts. So, when it completes the line of code or function you were midway through writing, it doesn’t know from which libraries the code snippets are being picked up. Therefore, the predicted code may be from or using a library that does not allow commercial use or has security vulnerabilities.
While the risks to using LLMs in business does exist, that doesn’t mean you should avoid it. Businesses who refuse to embrace the opportunities that AI offers will be left behind. Instead, there are ways you can and should mitigate this risk for your business.
Perhaps the most important step you can take to protect your business is understanding your vendor license. This includes the ways in which the vendor will manage, store, and use your company data. By implementing robust data access controls, you can avoid undue risk introduced through LLMs.
To get started, be sure you obtain and thoroughly read your vendor license agreement. Pay special attention to sections or clauses related to data usage, storage, and privacy. Here are some questions to ask yourself during the review process:
What type of data will be shared with the vendor (please be specific and provide data field names if you can e..g, name, email address, IP address, etc.)
Will we be using the output from this vendor/service as input into our systems or other services/applications for further processing?
What are our expected availability requirements (e.g., SLA) from this vendor/service?
If this vendor/service will be integrated/connected to our systems, what level of access this vendor/service would have into our systems that are part of the integration/connection with this vendor/service (Read, Write, Modify, Admin)?
If this vendor/service will be integrated/connected to our systems, how will we be connecting our systems to this vendor/service? (e,g dedicated VPN/private network connection to TS systems, TS systems making API call over the public network, direct user login to an internet-accessible portal, etc.)
Does this vendor make use of subcontractors to deliver in-scope services to my company?
Are vendor and in-scoped services based outside of the USA?
Is this a matured vendor? (e.g. multiple years in business with large/enterprise customers and some industry-recognized certifications)
Does this vendor have matured processes in place?
Is this vendor subjected to any regulatory oversight? (e.g. SOX, HIPAA, FINRA/SEC, etc.)
During the last 5 years, does this vendor have any security/privacy incidents where customer data was exposed to unauthorized persons?
If you are unclear about any of these questions during your review, reach out to the vendor for support and clarification. It’s important that you have a clear understanding of how your data will be utilized in order to ensure its security.
You likely already have a data security incident response plan in place. If you don’t, it’s important that you develop one. If you do, it’s vital that you revisit your plan due to the security risks introduced from recent LLM developments.
To create an incident response plan, you need to assemble a team of key personnel who will be responsible for incident response. Your team should include IT staff, security professionals, senior management, public relations professionals, and legal staff to ensure your plan is compliant with data protection laws and reporting regulations.
A comprehensive incident response plan should include but is not limited to:
Classification framework for incidents based on severity and impact
Detection via monitoring tools and best practices
Reporting system and procedures with identified roles
Response procedures with defined roles and responsibilities
Communication plan for internal and external stakeholders
Once you have your incident response plan in place, you should evaluate your classification and detection frameworks given the specific security vulnerabilities introduced by LLMs. Analyze your business vendors to understand which ones are utilizing LLMs, and revisit your license agreements to ensure there are no newly introduced vulnerabilities.
Once you’ve completed the due diligence of assessing vendor agreements for security vulnerabilities and creating an incident response plan, it’s important to educate your employees on security best practices. During this step, try to strike a balance of security awareness, importance, and procedures.
Here’s what we mean by that: In the age of LLMs, it’s vital that you facilitate and guide adoption of AI tools instead of slowing it. The goal is not to scare employees, but rather to empower them so that your business can harness increased efficiency while maintaining a culture of security awareness and compliance.
Here are a few ideas to get you started:
Assess your team’s current knowledge of LLM usage and knowledge of risks
Develop a training curriculum including theoretical and practical applications
Offer hands-on labs, training, and exercises specific to your vendor’s LLM usage and security best practices
Monitor risks and mitigation, and share real-life examples with your team
Promote a culture of reporting through rewards and recognition
Regardless of how much prework, planning, and training you offer, future risks will continue to rise as LLMs, and technology in general, become more integrated into our workflows and our lives. That’s why it’s important to continue monitoring and evaluating risks as you onboard new workplace tools, see updates in your current stack, or even start embedded LLMs into your own product.
The pace of innovation is at an all time high, partially due to the cycle introduced by LLMs themselves. To ensure your security matches pace, consider implementing some of these solutions:
Utilize advanced threat detection systems that notify you about breaches
Perform regular vulnerability assessments across your tech stack and vendors
Engage in industry forums and communities to see what peers are uncovering
Share security insights, learnings, and best practices across your organization
When we launched ThoughtSpot Sage, we applied the power of LLMs to develop a number of product enhancements including AI-generated answers, AI-powered search suggestions, and AI-assisted data modeling.
See the value of AI-Powered Analytics—try ThoughtSpot Sage for free today.
While quickly making these product enhancements available to our customers was a goal, our top priority remained safeguarding our customers’ data security. That’s why we built optionality, visibility, and transparency into every step of our product. Additionally, the product architecture is designed to be resilient against new-age, LLM-focused attacks against LLMs—like prompt injection and prompt links.
Visit our blog to find a full account of the steps we took to ensure data security with LLMs on ThoughtSpot Sage.