Ranking the Best OpenAI Language Models for Accuracy and Cost

In this post, I am sharing my exploration and findings after using OpenAI Language Models API endpoints. These language models are the same engines which empower ChatGPT, the most trending AI chatbot as of now. The purpose of my exploration was to find the best Language Model by using their text completion API endpoint which can suggest and complete texts based on provided user prompts.

Mainly, I wanted to see if these AI models can be used to generate criteria for DecisionMentor mobile app, a Multi-Criteria Decision Making (MCDM) application based on the theory of Analytic Hierarchy Process (AHP).

Surprisingly, I discovered that the most accurate AI engine is also the cheapest one!

OpenAI Language Models

OpenAI currently provides APIs for following four powerful text completion engines: Ada, Babbage, Curie and Davinci

Since the cheapest one is Ada, the best choice to go would have been Ada price-wise. However, the most important criterion to consider is “accuracy” i.e. if the model is not accurate, then there is no point in using it.

Also, the time it takes for the completion endpoint to return results is important as well.

So, with these criteria in mind, I started my explorations.

The Code Behind

Flexing my coding muscles, I setup a small Flutter app for this exploration (link to Flutter app and Dart code github repo). The app takes user prompt as query and calls the Completion endpoint for each AI engine. Then their responses and response speeds were recorded for comparison.

Once, I started reviewing the responses, it was quite easy to decide which AI model would be best for my use case.

Speed Comparison of OpenAI Language Models

Ada was usually the fasted however, sometimes Babbage was also faster than Ada when returning the responses. Davinci almost always took the longest time.

On an average:

Ada ~ 1300 – 1700 milliseconds

Davinci ~ 2800 – 3600 milliseconds

All models are extremely fast! No issues here.

Accuracy Comparison of OpenAI Language Models

In order to get the determine the best language model, I tested with different types of queries to check the engine accuracy. For example my queries looked like these:

Should I start my own business
choosing Best AI models
Travel to Mars or to Moon this year?
Best MBA school
Best Life Partner
etc.

Furthermore, to get relevant criteria that could be used in DecisionMentor, these queries were modified in this manner:

Suggest 9 criteria for “[my decision title]” to be used in AHP with maximum 4 words for each suggestion

So, the same queries converted into prompts like these:

Suggest 9 criteria for “choosing Best AI model” to be used in AHP with maximum 4 words for each suggestion
Suggest 9 criteria for “Should I start my own business” to be used in AHP with maximum 4 words for each suggestion
etc.

Out of the four OpenAI language models, only Davinci model was to able to respond with the accurate responses for my prompts.

Problems With Other Language Models

Even though I asked for precisely 9 items, they would return only 5-6 items
They completely ignore the `maximum 4 words` limit and returned full long sentences
Ada usually repeated the same suggestion twice
Some of the suggestions by Ada and Babbage were completely useless
Curie was quite close to Davinci in terms of suggestion quality, however, it suffered from problems 1 & 2.

This PDF below has some snapshots of the queries and responses I received from each models.

OpenAI-Comparisons Download

Cost Comparison of OpenAI Language Models

Cost-wise I was expecting Davinci would be the most expensive however I was taken by surprise here. Since I was limiting the suggestions to maximum 4 words, it reduced the cost of my query by a large factor when using Davinci model.

Since the APIs are priced per token and each token is approximately 4 characters, the model that returned the least characters was going to be the cheapest!

Price Comparison of OpenAI Language Models

Who would have known that you can get the best model at the cheapest price?

I am delighted here 😀

Merry Christmas 😀

Overall Experience with OpenAI Language Models and APIs

It was extremely easy to get started with the Completion API endpoints as OpenAI has really good documentations. One can get started for Free as they provide $18 of Free Credit that can be used during the first 3 months. Also, their pricing model is pay as you go i.e. pay for what you use.

After my exploration and usage, as of now, I feel their current prices are quite affordable and not that expensive at all. 🤞

The fact that these extremely powerful AI models can accessed conveniently via API at affordable rates for anyone is awesome.

Overall experience ~ Gooooood.

Post Views: 402