Words are flowing out like endless rain: Recapping a busy week of LLM news

in great shape , Image of a boy surprised by flying letters.

Some weeks are extremely quiet in AI news, but during other weeks, catching up on the week's events feels like trying to stem the tide. There are three notable large language model (LLM) releases this week: Google Gemini Pro 1.5 hits General Availability With the free tier, OpenAI shipped a new version of GPT-4 Turbo, and Mistral released a new openly licensed LLM, mixtral 8x22b, All three launches took place within a span of 24 hours starting on Tuesday.

With the help of software engineer and independent AI researcher Simon Willison (who also wrote about This week's busy LLMs have been launched on their own blog), we'll briefly cover each of the three major events in roughly chronological order, then look at some additional AI events this week.

Gemini Pro 1.5 general release

Tuesday morning Pacific time, Google announced Its Gemini 1.5 Pro model (which we first covered in February) is now available in public preview via the Gemini API in over 180 countries except Europe. This is Google's most powerful public LLM to date, and it's available at a free tier that allows up to 50 requests a day.

It supports up to 1 million tokens of input context. As Willison notes in your blogGemini 1.5 Pro's API price of $7/million input tokens and $21/million output tokens is slightly lower than GPT-4 Turbo (price $10/M in and $30/M out) and over Cloud 3 Sonnet (Anthropic's mid-tier LLM), price $3/million in and $15/million out).

Specifically, Gemini 1.5 Pro includes native audio (speech) input processing that allows users to upload audio or video prompts, a new File API for handling files, custom systems for directing model responses. Ability to add instructions (system prompts) and a JSON structured mode for data extraction.

“Majorly upgraded” GPT-4 Turbo launched

in great shape , GPT-4 Turbo performance chart provided by OpenAI.

Shortly after Google's 1.5 Pro launch on Tuesday, OpenAI announced that it was releasing a “vastly improved” version of GPT-4 Turbo (a model family that originally launched in November) called “gpt-4-turbo.” -2024-04″. -09.” It integrates multimodal GPT-4 vision processing (recognizing the content of images) directly into the model, and was initially launched only through API access.

Then on Thursday, OpenAI announced that the new GPT-4 Turbo model has become available to paid ChatGPT users. OpenAI said the new model improves “capabilities in writing, mathematics, logical reasoning, and coding.” shared a chart It is not particularly useful in assessing abilities (that they Updates, company also gave an example One reported improvement states that when writing with ChatGPT, the AI assistant will “use more direct, less verbose, and more conversational language.”

The vague nature of OpenAI's GPT-4 Turbo announcements attracted some attention Confusion and online criticism. On X, Willison wrote, “Who will be the first LLM provider to publish actually useful release notes?” In some ways, this is a case of “AI vibes” again, as we discussed in our lament about the poor state of the LLM benchmarks during the introduction of Cloud 3. “I haven't really noticed any definite difference in quality (related to GPT-4 Turbo),” Willison told Us in an interview.

The update also extended GPT-4's knowledge cutoff to April 2024, although some are reporting that it makes it through search the hidden web in the background, and other people on social media have reported issues With date-related complications.

Mistral's Mysterious Mixtral 8x22B Release

in great shape , An illustration of a robot holding a French flag, metaphorically showing the rise of AI in France due to Mistral. It is difficult to photograph the LLM, so a robot will have to do it.

Not to be outdone, on Tuesday night, French AI company Mistral launched its latest openly licensed model, Mixtral 8x22B. tweeting a torrent link Devoid of any documentation or comments, as was the case with previous releases.

The new Mixture-of-Experts (MOE) release weighs in with more parameter calculations than the most capable open models that preceded it, Mixtral 8x7B, which we covered in December. Rumor has it that it's potentially as capable as GPT-4 (in which way, you ask? Vibes). But that is yet to be seen.

“Evals are still ongoing, but the biggest open question right now is how well Mixtral 8x22b shapes up,” Willison told Ars. “If it is in the same quality class as GPT-4 and Cloud 3 Opus, we will finally have an openly licensed model that is not significantly behind the best proprietary models.”

This release most excited Willison, saying, “If that thing is actually GPT-4 class, that's crazy, because you can run it on a (very expensive) laptop. I think you should “Requires 128GB of MacBook RAM, which is twice as much as I have.”

The new Mixtral is not yet listed on Chatbot Arena, Willison said, because Mistral has not yet released a fine-tuned model for chatting. This is still a crude, LLM predicting the next token. “Though there is now at least one community instruction tuned version floating around,” says Willison.

Chatbot Arena Leaderboard Shake-up

in great shape , A chatbot Arena leaderboard screenshot taken on April 12, 2024.
benj edwards

This week's LLM news isn't just limited to the big names in the field. There is also an uproar on social media about the increasing performance of open source models like Coheer. command r+Who reached 6th place LMSYS Chatbot on the Arena Leaderboard – highest ranking ever for an open-weight model.

And for more chatbot arena action, apparently the new version of GPT-4 Turbo is proving to be competitive with Cloud 3 Opus. The two are still at a statistical tie, but recently GPT-4 Turbo pulled forward Numerically. (In March, we reported when Claude 3 numerically overtook GPT-4 Turbo for the first time, which was then the first time any other AI model overtook a member of the GPT-4 family of models on the leaderboard Was.)

Of this fierce competition among LLMs – which most of the Muggle world is unaware of and probably never will be – Wilson told Ars, “The last two months have been a whirlwind – after all we have not just one but multiple models. Which are competitive with GPT-4.” We'll see if OpenAI's rumored release of GPT-5 later this year will restore the company's technological lead, which, as we've noted, once seemed impossible. But for now, says Willison, “OpenAI is no longer the undisputed leader in LLM.”