So you solved all the previous architectural challenges and have moved on to the Operationalize side of the equation. Which is the dreaded Reporting and Metrics side of things.
In most cases, the systems will be thrown over to the infrastructure team to manage and being most aren’t coders, are unlikely to be able to fix things non-infrastructure related if they don’t know anything about the code or system. However, if it goes down, or performance starts to degrade, they are the first to be notified. The infrastructure team will need a set of guidelines and procedures for working any issues and when to escalate to support or the developers. Having an understanding of the Reporting and Metrics will help guide them on what the issue may be and what the next steps are to resolve it.
Reporting
Measuring the system and understanding how it is running is another step in getting to a successful GenAI deployment. Somethings people like to know:
- Common reports
- How many requests have been sent and when?
- Helpful for finding peaks and valleys in your usage patterns
- Can support auto scale of the system during times of high usage
- How many active users are there?
- As a lot of people present at conferences, this typically is referred too as your adoption rate.
- Who is the most active user?
- Are they using the system more than they should or are they using it the way it was meant to be used?
- Your top users can potentially become your spokesperson for the rest of the users/company.
- What are your users asking about the most?
- Typically helpful for refinement and potentially caching.
- Is anyone abusing the system from shear number of requests (denial of service, or costing you a lot of tokens)?
- Sometimes you may need to limit the number of requests per user on some X timeframe.
- How many requests have been sent and when?
- Responsible AI
- If you are using Content Safety, requests/users that are abusing the system.
- If you are using Presido, what request/users are asking for or returning sensitive data?
- Costs
- Azure – You should be able to break down the costs across the various Azure infrastructure and create a baseline. Also, this is MANDATORY, put budgets in place that fire every possible alert you can add.

- Charge backs – If the system you have deployed supports multiple departments, you should be able to charge back the costs based on the usage of the system. Utilizing some kind of “cost_center” tagging across the resources can help with that. You would need to be able to export those reports on a monthly automated basis.
Logging
It won’t be practical to log everything. The system should be able to specify the log levels (Info, Warn, Debug, Trace) that gets sent to your logging solution. You should also be careful when turning on those lower levels that you turn them back off when you are done!
- Errors
- Application Insights. Having your components log to application insights or some other logging solution is also mandatory. For example, FLLM uses OpenTelemetry which can pretty much send the logs anywhere.
- Latency – Being able to measure latency of your requests from end to end or anywhere in-between help to find issues in your system. This can be especially important when its under high levels of load and stress.
- Separate note: When using zero-trust deployments, you are likely going to be tempted to utilize Managed Identity for all cross service calls. Although this is the right approach, MSI token creation NEVER uses the token cache and can cause some serious latency issues in your request flows. This may not be an issue with a simple “chat bot” style agent when you don’t care much about how long it takes to respond, but in systems with 10s or lower targets, this can kill your design.
- Common issues – things you should be able to easily diagnosis:
- Errors/Exceptions – These are going to occur, being able to find the error and diagnosis its cause, needs to be quick and efficient (even though the fix may be much more complex and time consuming to implement).
- Problem agents/tools – You may have multiple agents as part of the system overall system servicing the needs of many groups of individuals/departments. Being able to narrow down to the agent causing the most problems (requests/token burn), will be a vital skill. Additionally, if an agent utilizes more than just a simple LLM call and has various tools at its disposable, you should be able to narrow things down to the tool level and find problem tools that need attention.
- Service Level issues – If you were not aware, ACA and AKS have limits on their request timeouts. ACA maxes out at 4 mins, and no, you cannot increase it. AKS has more flexibility, but still has its limits for the sake of system stability. The system you build or use, should never use synchronous calls to LLM models. Some of these models can have some pretty bad latencies (up to 2/3+ minutes). Add in your layers, and it spells a recipe for disaster. The platform you build/choose should utilize async patterns everywhere. This allows for your requests to pass through ACA/AKS without having to worry about timeouts which then produce nerve racking 499s (which can also occur from your calling apps when they give up waiting for a response).
Metrics
The compute layers are responsible for handling the requests as they flow through the application layers and keeping track of request status (although easier said than done). Here are some basic things you should be watching at all times:
- Models (Azure Open AI, etc)
- Requests / minute
- Tokens / minute
- Latency
- 429s
- Compute layer (ACA + AKS + AML, etc)
- Scale operations (adding nodes/pods)
- CPU
- Memory
- Network
- Data layer (Cosmos, Postgres, Azure SQL)
- Vectorization Pipelines
- Is your data/vector pipeline running as expected?
- Vector Databases
- Is your vector database running as originally intended from performance standpoint?
- Is the index initially used, the right one for serving requests?
Costs
Log analytics (or whatever cloud solution that is used) can easily surpass the costs of your other services just by the shear volume of data that could potentially get sent to it. By default, you get 30 days of log data before it start to prune, some people may want to increase that to 90 days for compliance and reporting purposes. However, the more data you keep, the higher the costs.
AI Governance
It is easy to spend a few months on a GenAI project and suddenly realize that it has taken a course that is not in alignment with its original goals. Being able to take the data collected above and continually measuring against the target baseline will help ensure that the original purpose of the project stays the course.
But, if you find that the original premise was wrong, then a course correction may be warranted and it would be a part of your AI Governance Board / Center of Excellence to make the necessary decisions to adjust the direction based on new data and priorities of the business/organization.
Having a meeting every quarter is a pretty good starting point. The primary executive sponsor along with all relevant stakeholders and AI leads should attend the meeting and review the data to discuss any concerns or areas that need improvement. With costs and ROI in mind, any course corrections should be agree upon and documented with clarity of everyone’s roles and responsibilities.
Having a target set of goals to achieve before the next meeting along with a frequency and agreed way to measure against those goals should also be decided upon. This will help to show progress and whether the goals will be met in time.
Summary
Whatever solution you build or deploy or rent, should have a wide variety of ways to get reporting and metrics out of the system. It should be one of the top things you consider when choosing a platform. Luckily, if your GenAI system is deployed to one of the major cloud providers, they tend to provide a lot of ways to setup and get that data for you.
The only caveat, the more reporting, logging and metrics you collect, the more you are going to pay for it.
Contact
Need help getting your GenAI project started and/or over the finish line? Ping me, always happy to help!
Email: givenscj@hotmail.com
Twitter: @givenscj
LinkedIn: http://linkedin.com/in/givenscj
GenAI Blog Series
- #1 – Build it or Buy it/RentIt
- #2 – Host it or get SaaS-y
- #3 – Train vs Mainstream models
- #4 – Scaling your solution (and not break the bank)
- #5 – Implementing Security (oh so many levels)
- #6 – Reporting, Logging and Metrics
- #7 – MLOps/GenAIOps, some kind of *Ops
- #8 – Measuring Return on Investment (ROI)