Which AI tool is the most accurate?

24 March 2024 (20:59)

215

The rise of generative AI models in the tech industry has been swift and undeniable. The trend started with Microsoft-backed OpenAI’s announcement of ChatGPT-3 in late 2022. The introduction of ChatGPT took the world by storm and forced Google to release an early version of its AI model on March 21, 2023. That’s how we got Bard, Google’s first publicly available chat-based generative model.

ChatGPT and Google Bard received frequent updates throughout 2023 with better availability, language support, and advanced features. OpenAI released its most advanced chatbot version with ChatGPT-4. Bard received its biggest upgrade with Gemini Pro. With the arrival of the Gemini app on Android phones, Google officially rebranded Bard as Gemini.

Considering the fierce competition, a key question comes up. Can Google Gemini keep up with OpenAI? To find out, we compared ChatGPT-4 and Gemini to see how they performed in different tasks.

The evolution of ChatGPT

Following the success of ChatGPT-3, OpenAI released GPT-3.5, a better-trained conversationalist capable of engaging in topics from science and technology to art and literature. But it was the subsequent GPT-4 that revolutionized the game.

Available to subscribers at $20 per month, GPT-4 boasts an impressive processing power with a context window of around 25,000 words, a significant leap from GPT-3.5’s 3,000-word limit. This upgrade in GPT-4 allows for a more accurate understanding of contextual prompts and a better handling of lengthier conversations.

Google Gemini

Gemini is both a multimodal large language model (LLM) and the new branding for Google’s chatbot formerly known as Bard. As an LLM Gemini is the successor to LaMDA and PaLM2. In December 2023, Bard received its biggest upgrade with Gemini Pro. It was the first sign of Google moving away from Bard with Gemini branding. The search giant expanded it to more than 170 countries, added support for other languages like Japanese and Korean, and integrated closely with other Google apps like Gmail, Docs, and Drive.

What are large language models?

Large language models (LLMs) are the basis for AI chatbots and much more. Here’s what’s going on behind the scenes

With the launch of Gemini Ultra 1.0, Google officially dropped Bard branding in favor of Gemini and launched it on the Android platform. It is available as a standalone app. For some users, Gemini has replaced Google Assistant.

If you want to try out Google’s most powerful AI model, Ultra 1.0, upgrade to the Gemini Advanced plan at $20 monthly. It offers better performance for highly complex tasks, 2TB Google Drive storage, and other benefits from Google One.

Fictional plot summarization

One area where AI could replace human writers faster than expected is simple summarizations and newswire-based journalism. This kind of writing usually shortens and simplifies longer existing texts, which is something that generative AI has become good at. Finance stories about the stock market and that juggle a ton of numbers are written by AI.

We asked ChatGPT-3.5, ChatGPT-4, and Gemini to give us 300-word summaries of Frank Herbert’s classic Dune novel.

In this comparison between Gemini and ChatGPT, both models deliver promising results. Bard lagged GPT-4 in our previous comparisons, this time around. However, Gemini is up there with GPT-4 in providing a well-rounded and comprehensive summary. If you’re unfamiliar with Dune, read the summaries created by Gemini and GPT-4.

GPT-3.5 offers richer detail, while GPT-4 balances outlining the plot and getting into the deeper themes, giving readers a panoramic view of the story. Gemini’s summary presupposes some familiarity with Dune. It overlooks key details like the initial control of the planet Arrakis by the Harkonnens and their eventual reclaiming of it with the emperor’s aid.

Purchase recommendations

Many people use Google for purchase recommendations, be it for a new washing machine or a pair of earbuds to wear while working out. Since there’s an overwhelming amount of information online about the latest and greatest products, generative AI models should have an easy time giving recommendations. We asked ChatGPT and Bard which new phone you should buy.

In this comparison, the data limitations of GPT-3.5 from 2021 become evident, especially when stacked against GPT-4 and Google Gemini, both of which have web browsing capabilities. Google Gemini offers a detailed list of top phones on the market, including specifications and pricing. GPT-3.5, due to its constraints, provides generalized buying advice. ChatGPT-4 aligns closer to Google Bard by suggesting specific phone models. However, it fails to provide pricing details and detailed specifications.

ChatGPT and Google Gemini stress the importance of individual preferences, budgets, and desired features when making recommendations. In this matchup, Google Gemini takes the lead with its comprehensive list of top phone models, as well as specifications and prices.

Excel formulas

Microsoft Excel and Google Sheets are powerful tools that assist with many tasks, including tracking stock prices, project management using Gantt charts, and analyzing data trends. Many of us only use a fraction of the available features, particularly when it comes to advanced formulas. That’s where natural language AIs come in handy. They can recommend which formulas to use to achieve your goal with a given spreadsheet.

In this face-off, ChatGPT-4 and Gemini take the lead. GPT-4 and Gemini efficiently addressed the query. Gemini offered detailed explanations that are user-friendly, even for those unfamiliar with Google Sheets. GPT-3.5, while less detailed, provided essential information complemented by an example.

The image shows a graphic with a dark blue background and numerous digital elements, suggesting a high-tech or computational theme. Central to the image is the text

Meet Ernie: China’s alternative to Gemini

Baidu’s Ernie 4.0 comes as China’s formidable answer to the AI race

Tourist travel itineraries

Planning a trip can be tedious, and finding a good place to start is sometimes difficult. That’s where informative travel plan apps and chatbots come in handy. We pitted ChatGPT and Gemini against each other for a two-day weekend trip to New York City.

Based on our analysis of the itineraries, Gemini offered balanced suggestions with relevant web sources and tips at the bottom. However, the chatbot misrepresented the connection between the Statue of Liberty, Liberty Island, and Ellis Island, calling for an unnecessary return to Battery Park. GPT-4 went beyond creating itineraries and offered additional travel tips. In contrast, ChatGPT-3.5 adhered strictly to the itinerary task.

The ChatGPT and Gemini versions presented more balanced days, factoring in meal breaks and suggesting ideal neighborhoods for meals. They highlighted popular attractions and adopted a holistic travel experience, adding logical activity sequences and unique cultural experiences, like Broadway shows.

We tested the same task using Delhi, India, as our next city. Gemini briefly introduced India’s capital city and showed relevant places to visit in 48 hours. Google’s chatbot also displays relevant pictures with web sources to learn more about the mentioned places. It even suggested additional options if you plan to extend your trip or have more time during your India trip. Tips are at the bottom to make your Delhi visit memorable.

GPT-4 also did a commendable job by dividing the entire trip between South Delhi and Old Delhi. However, it missed out on suggesting Qutub Minar, one of the city’s popular attractions. GPT-3.5 also did a solid job of suggesting relevant places to visit on a two-day visit to Delhi. It divided the trip into morning, midday, afternoon, and evening and suggested more places to visit, which are hard to cover in a couple of days.

Bonus: How to find and join a US PhD program

To put Gemini and ChatGPT to work with a more complicated question, we asked them to guide us through the process of enrolling in a PhD program in the US, with the added twist that we didn’t study in the US, but that we obtained our master’s degree in the EU.

Gemini and ChatGPT-3.5 recognized that we graduated with a degree in the EU, yet neither looked into potential challenges, such as language certificates or visa requirements for non-US citizens. In contrast, GPT-4 explored deeper, addressing the application logistics and post-admission aspects like the visa process and tips for cultural acclimatization. Gemini discusses the research first and then shares tips to strengthen your profile.

When examining ChatGPT’s responses, GPT-3.5 highlighted the role of advisors in the PhD process, recommending a research-focused approach to applications. GPT-4 provided a comprehensive view of the admission process, presenting a logical step-by-step guide.

The directness of Bard contrasts with the structured guidance of GPT-4 and the advisor-centric approach of GPT-3.5. Some of the most useful tips are buried at the bottom, which talk about connecting with current PhD students or alums and exploring research opportunities and life in the US.

A word on mobile apps

Google recently released Gemini on Android. The search giant plans to replace Google Assistant with Gemini on Android. Users can swipe up from any bottom corner and use Gemini for their queries. That gives Google’s chatbot an advantage over OpenAI’s ChatGPT. These generative AI models are trained on user data, and thanks to billions of Android users worldwide, Google is better poised to collect valuable info.

ChatGPT is accessible on Android and iPhone. While these mobile apps get the job done, Gemini feels modern and more intuitive compared to a bland ChatGPT app.

Generative AI has a lot of potential

Through our tests, ChatGPT-4 and Gemini go neck-to-neck. However, we still give a slight edge to OpenAI’s chatbot. ChatGPT-4 makes the $20 per month investment worthwhile. The responses from GPT-3.5 and GPT-4 often provide greater context, and they excel in addressing follow-up queries. They also boast enhanced functionalities, such as crafting spreadsheet formulas. Neither solution is 100% accurate. For the time being, fact-check everything that these Generative AI tools suggest.

As for Gemini’s paid plan for $20 per month, we would like to see Gemini’s integration in Gmail, Docs, and other apps, which is currently missing from the Advanced plan.

Looking ahead, there’s no doubt the landscape will change for the better. ChatGPT and Gemini are rapidly progressing, and it’s foreseeable that these will become indispensable tools in different professions, along with simplifying numerous everyday tasks we might soon take for granted.

The best smartwatches in 2024: our 13 favorites