As you continue to mull over your decision to built it, buy it and host it or SaaS it, at some point you will need to pick a model or models to use. Since not every use case is the same, it may not be that the latest mainstream model will work for you. It could very well be the case that you have to build your own model. This typically occurs do to the highly sensitive or proprietary nature of the data and use case.
Main Stream Models
You probably get bombarded with blogs, tweets and other things talking about how cool this model is and how cool that model is. Reality is, over the past 2 years we have seen models come and go. New models come out that are cheaper, provide better answers, etc. The latest news about Deepseek definitely sent some shockwaves through the AI community and as you can imagine, people that have already deployed GenAI solutions are very much thinking…how fast can I get an agent up and running to test that model with my current setup. I know I have had the conversations already, “Can we switch to that model?”.
You have to be careful with utilizing mainstream models. You could look at benchmarks all day but the reality is…will it run at scale? Will it give you the performance and accuracy required AND not cause you to go broke! This isn’t a fast and easy question to answer sometimes and will require a proof of concept setup and a series of performance tests.
For example, let’s take Azure AI’s version of GPT4o. You can easily find the costs for this cloud hosted model available on the Azure website. Ultimately it is up to you to request the size of the deployment you will need. This isn’t as easy as you think it is.
There are two main metrics: requests per minute and tokens per minute. This can be a bit misleading. For anyone that has built GenAI platforms at scale, you know that the real limit is actually at 10 seconds. If you are on pace to go over your per minute limit, you will get 429s. We will explore more of this in the next blog post!
Saving the scaling conversation for later, suffice to say, you have to find a model that fits your use case. It might be an OpenAI one, it might not. There are lots of open source models sitting out there that you can choose from that may actually get you where you need to be, without costing you a lot of money.
Train your Own
Mainstream models having been trained on a plethora of things. Most of the training parameters and knowledge built into the model probably won’t be very important or even needed for your specific use case(s). In this case, its probably overkill and going to be much too expensive to utilize. So what do you do?
First step, start browsing all the models on Hugging Face and check out all the crazy advanced Kaggle competitions. Its highly likely you’ll find something that will come close to what you are looking for. However, if you don’t find anything you like. Its time to go to the drawing board and start doing some diagrams and math equations!
Using some common tools like PyTorch, Tensorflow and Keras, you can absolutely rebuild the mainstream models into your own model. Granted, not all models publish how they were trained or provide their source code, but several do, so you have plenty of examples to work off of. If this isn’t something you are comfortable with, go check out Kaggle and start with their beginner competitions. You’ll really enjoy them and it will give you a sense of how to get started.
Now, the advantage of the main stream models, they have already been trained. Some have taken months to do so, and needed data centers full of very expensive GPUs. Someone has already done that work and paid for that computing power. If you don’t have a budget to go buy, setup and operate a bunch of GPU clusters, then you are really stuck with two options. Rent some GPUs from the big three cloud providers (Azure, AWS, GCP), or find a specialized GPU data center provider (yeah, I didn’t know they existed either until a few months ago) for hopefully cheaper rates than buying all of it yourself.
This is where things can get even more tricky. The model you send for training, better be accurate. The last thing you want it to spin up a 50/100 GPU cluster, send in your model and let it sit for the next few weeks running through 1000s of epochs only to find out you forgot to put a neural network layer somewhere. That can hurt the pocketbook.
Fine-tune models
Then there is the meet in the middle approach. The model you are using is like 90/95% of the way there. It just needs a little bit of help to get it to that 99/100% mark. This is where supplementing the knowledge of the model with something extra (things specific to the domain of which it will be requested) can help.
For example, the term “goal” in GPT land typically means “a personal goal” aka something you are trying to achieve. However, in the domain of soccer/futbol, this means a totally different thing. Being able to supplement the core model with domain specific knowledge can help prevent it from getting confused and potentially hallucinating weird stuff to your users.
If you don’t have the budget to retrain a large billion parameter model, but need the flexibility to make a main stream model a bit smarter, then fine tuning may be the way to go. But, I’ll provide the disclaimer, your results will vary with fine tuning. So don’t expect too much here.
Just like prompting, the more refined and straightforward your data is, the higher chance you will get the results you are looking for. So effort has to be put into selecting the most appropriate fine tuning data.
Summary
Picking a model is not a simple task. There are many factors that play into the decision of what model you will eventually use for the use case. Again, not every use case will use the same model, so be ready to support lots of models.
You should also be ready to abandon old models in leu of newer models.
The platform you choose, should ultimately allow you to do that. If it doesn’t, you will be at their mercy as to what they support and your competitors may leap frog you as their solution was much more agile than you in its ability to “use anything”. Luckily, FoundationaLLM allows for you to plug and play at various levels to let you take advantage of whatever model you want. Given the modular design of components, its pretty simply to copy an existing agent and tools and then point it to your new model and get your testing going. This should be a requirement for any platform you are looking at.
Lastly, and hopefully this has hit you already, each of these paths is expensive. Wait, did I mention that GenAI is expensive? After we get all the GenAI basics out of the away, we’ll certainly explore ROI in a later blog post.
Contact
Need help getting your GenAI project started and/or over the finish line? Ping me, always happy to help!
Email: givenscj@hotmail.com
Twitter: @givenscj
LinkedIn: http://linkedin.com/in/givenscj
GenAI Blog Series
- GenAI Blog Series
- #1 – Build it or Buy it/RentIt
- #2 – Host it or get SaaS-y
- #3 – Train vs Mainstream models
- #4 – Scaling your solution (and not break the bank)
- #5 – Implementing Security (oh so many levels)
- #6 – Reporting, Logging and Metrics
- #7 – MLOps/GenAIOps, some kind of *Ops
- #8 – Measuring Return on Investment (ROI)