A lot of people just jump right in and start coding a Python notebook that makes some calls to a model endpoint. They will typically pass in some simple prompt, a little bit of data, and do a model request. They will likely get back some cool response and immediately get excited about what they just did. Next, they will run off to show their boss or team and say “hey, look how cool this is and what we can do”. The excitement builds and the company realizes the potential for the use case.
So management then asks, “How do we get this to production and will it be secure?”. Because by the way, the data the data scientist passed in, is actually corporate top secret.
Oh boy, so that little notebook now needs a whole platform built for it and it has to be ultra secure, so where do we start?
Whether you choose to built it or buy it, security should be the #1 thing on your mind. There are so many levels that security can apply too in the system. But not only the system itself, but the users that access the GenAI system. They have to be monitored and secured. If they get compromised, then the attacker has full access to any GenAI agents the user has access too! This is where the E5 licenses from Microsoft and all the security it enables actually start to make a lot of sense!
You are probably thinking, great, another thing to secure. And you’d be right, as the last thing you need is another attack vector that people can take advantage of to cause privacy breaches, legal exposures and lawsuits, or corporate blackmail events.
UIs (User)
First thing, you are going to need a UI or API. Let’s assume the application will need a chat UI. The UI will need to be secured with some kind of IdP (Authentication). Likely Microsoft Entra, Amazon IAM, Okta/Auth0, etc. Being that most frameworks these days provider authentication provider frameworks, shouldn’t be a big task here, but you will need to be able to pass that user credential to the rest of the layers as they will need to know what the user can actually see (Authorization).
Whatever system (remember Buy It/Build it?) you decided to use, should be able to support external IdP integration, if it doesn’t, then I’d say its a no go. After all, you may just want to allow external partners to use the system and charge them for it! We’ll revisit this in the Return on Investment post later!
And lastly, if you are going to utilize this platform for all your employees, you will likely need to consider all the accessibility laws and ensure that whatever you are using, meets the needs of your users.
UIs (Management)
How will you add new agents? Will you let end users create them? When you create an agent, what will the agent be able to use in terms of tools, models, features? Ultimately, this boils down to what can the “user” see in terms of the tools, models and features. How will this work?
APIs
If the system doesn’t need a UI, and all you need is to integrate it with your current applications via an API, then you can skip the User UI requirement and go for the Core and Orchestration layers. If you don’t need to keep track of things like chat history and token burn, then you can probably skip those layers and just go directly to the orchestration\workflow (similar to hosting your agent/tools/workflow in promptflow and Azure Machine Learning style). But you’ll still need to be able to authenticate and authorize the user/application.
Agents
Agents are at the top of the food chain. They provide the template for what will occur when a user/app makes a request. They define the type of workflow/orchestration, the tools, its prompt, etc.
Because an agent will have configuration of tools which then have access to various data sources, its very important that you set the authorization at the agent level properly. One of the most common mistakes people make (especially in Microsoft Copilot), is indexing a bunch of data that is corporate top secret. They then expose that data via an agent without any permissions on it. That then allows anyone that can login to the UI/API to gain access to that data.
Having the ability to lockdown your agents based on the tools and data it has access too, is a vital feature for any GenAI platform.
Tools
Most people build an agent, give a prompt, point it to a model…and that’s it. Simple. However, those of us that have been around a while know that’s not how it works anymore.
You need much more advanced agent types than just a simple knowledge management agent (an agent that points to a vector database).
People want AI solutions that *do things*.
- They want analytical agents that take data from a database (using model generated dynamic SQL) and combine that with other data from some other source(s) and add that to a complex prompt that is then fed into the model to get some advanced completion.
- They want agents that execute action/functions against external systems.
- They want agents that can perform complex workflows that Plan/Replan that run for hours/days.
Agents should be able to be built with plug and play workflows, plug and play tools that implement much more advanced patterns. And by the way, tools have tools.
Models
Agents should be able to plug and play the models they use. This allows for moving to the next model version or completely new model seamlessly and as a result, hopefully improves your latency and accuracy.
Additionally, creating multiple agents with different models allows for some great A+B testing.
As discussing in the “Train vs Mainstream models” post, there are many models out there that can be used for your agents and tools. Having the ability to add in any model from any external platform and then have an agent or tool use that model is pretty powerful.
When creating models, they can have various properties to them, such as temperature, top_p, top_k, etc. When creating a model, your UI should allow for the dynamic adding of model properties that the agent or tool knows how to pass to the model.
This is not an easy feature to implement, but something you should consider when going at it on your own.
If you fine tune the models, or use models built on your own data, it becomes even more important to secure the access to the model because it now has your corporate confidential information integrated into. A compromise of an agent that uses the model can lead to corporate data leakage.
End Points
Models can be hosted in any number of places. The same model can be hosted in Azure, AWS and GCP. End points can be defined to specify where an agent or tool can make the call to a specific model. These endpoints should also be locked down. You don’t want someone adding a very expensive model to their agent and then exposing it to every person in your company and thusly getting a $100K bill that month.
Not only are their endpoints for your models, but if your GenAI system enables external application integration through API calls, those also need to be secured. This can be done through Managed Identities, API Keys, Certificates, etc.
Now that your applications are integrated, they have access to your model. Anyone that has access to the application, now has access to the model. What if the application is compromised and starts to send a high number of requests to your GenAI platform? How will you monitor and control this?
The layers are starting to add up! End to end zero trust security is looking even more complicated.
How are you passing the user credentials from the app to the platform to the model? What is multiple IdPs are involved (Entra to AWS to GCP anyone)? This is not an easy task.
Chat Sessions
Users typically are owners of there own chat sessions. Some platforms have started to toss around the idea of “sharing” chat sessions. This presents some interesting challenges, but is a very cool idea. Microsoft Security Copilot allows for this, but will put the thread into a read-only state after you have performed the operation.
Attachments
Users typically want to pass in attachments for the Agent and Tools to work on. This is very common for people that utilize the OpenAI Assistants API. Attachments must be stored and tracked and be available to be referenced as part of the chat history. In addition, any generated files from the model, must also be saved and tracked.
These files/attachments, should not be visible to other users, therefore, security should be defined such that only that particular chat session user has access to the chat and the attachments as part of the chat.
This gets even more complicated when you decided you want to be able to “share” chat sessions with other people as noted above.
Vectorization
Knowledge management agents are a common first step for most companies tipping their toes in the GenAI waters. These typically require you to vectorize some documents. Organizations have varying levels of requirements when exploring these paths. Some may have only a handle full of documents, others many have several 1000s or terabytes of data they want vectorized.
A data\vectorization pipeline typically is broken into four main steps:
- Content Source
- What content sources does the platform support? (SharePoint/M365, Datalake, Blob Storage, Snowflake, etc).
- Does the system support pulling Access Control Lists (ACLs) from the source?
- Text Partitioning
- Once you download the data from the content source, you have to break it apart. There are many different ways to do this for different file types (remember iFilters). PDF files is a common one. Chunking and overlap are common parameters that have to be experimented with.
- Text Embedding
- Once you have the chunks, you need to embed these chucks using some Embedding model. The most popular option the past two years has been text-ada-002. This allows for 1538 vectors which is pretty decent. However, if you see where this model sits in the ranking of embedding models today, its somewhere in position #75.
- At some point, you won’t want to utilize these old models anymore and you will want to migrate to a new model. This would mean a full re-vectorization of your content.
- Indexing
- So you have the embeddings, where are you going to put them?
- Cosmos DB? Azure AI Search? Pine? PostgreSQL?
- This step will save your embeddings to the target store.
Content Sources
Although this was covered above in Vectorization, it is important to note that most platforms allow you to ingest/vectorize any data you want, but very few will allow you to bring in the ACLs. This presents an important security issue. Any knowledge management agent that is pointed at a vectorized datastore that had its ACLs ripped off, know falls back to the security on agent itself. This was pointed out above, but worth noting again as it presents a pretty big security hole when you go live without security in place.
System Data / Reporting
If you choose to store your chat sessions and agent completions, it will likely go into some kind of data store. Stakeholder users will want to be able to gain access to basic reporting capabilities such as:
- Request per User
- Tokens per User
- Errors
- Prompt and Completion token usage
- Charge back (cost centers)
- CPU, Memory and Network loads on the compute layer
- Latency and Requests for your models
Power BI is typically asked for, but because of some of the limitations of the product, it doesn’t end up being a viable option without a lot of extra work.
So this typically falls back to creating some kind of customized reporting via Python notebooks. Most authentication should be done via Entra or IAM based identities, not APIKeys (Zero trust).
System Access (Azure resources)
In addition to the data plane, you need to consider the control plane. This is the access to the compute and other resources that host the various layers. When issues start to pop up, someone will need to be able to login to the containers/pods and look at the logs, as well as things like Application Insights and Log Analytics.
You also need to prepare for things like upgrades, so who will be responsible for gaining access to the AKS cluster and storage resources to do those container and scripted upgrades.
Summary
Still want to build your own GenAI from scratch? Security is an important part of the design, but most people overlook what that really means. There is a lot to consider across all the layers that can cause some serious concerns if not implemented correctly.
Its important to do a very detailed review of how users and applications access the system and how the authentication and authorization flows through all layers. Finding gaps where a malicious user/app can take advantage of weak security should be a top priority and either you fill those gaps, or determine mitigation techniques.
The last thing you need is to have to deal with a corporate data leak, compliance issue and any negative hits to your organizations reputation.
Contact
Need help getting your GenAI project started and/or over the finish line? Ping me, always happy to help!
Email: givenscj@hotmail.com
Twitter: @givenscj
LinkedIn: http://linkedin.com/in/givenscj
GenAI Blog Series
- GenAI Blog Series
- #1 – Build it or Buy it/RentIt
- #2 – Host it or get SaaS-y
- #3 – Train vs Mainstream models
- #4 – Scaling your solution (and not break the bank)
- #5 – Implementing Security (oh so many levels)
- #6 – Reporting, Logging and Metrics
- #7 – MLOps/GenAIOps, some kind of *Ops
- #8 – Measuring Return on Investment (ROI)