Apple releases a new open source machine learning model: "Ferret"

What is Ferret?

Ferret is a multimodal large-scale language model capable of referring and locating objects based on instructions. This model combines regional representation and visual sampling to achieve precise reference and positioning. It was trained on the GRIT dataset and has an evaluation benchmark named Ferret-Bench. The code and checkpoints for Ferret are available on GitHub.

In an October X channel, Apple AI/ML research scientist Zhe Gan described Ferret as a system with the ability to “identify and pinpoint anything in an image at any level of detail.” This means it can analyze different shapes and regions within an image.

🚀🚀Introducing Ferret, a new MLLM that can refer and ground anything anywhere at any granularity.
📰https://t.co/gED9Vu0I4y
1⃣ Ferret enables referring of an image region at any shape
2⃣ It often shows better precise understanding of small image regions than GPT-4V (sec 5.6) pic.twitter.com/yVzgVYJmHc
— Zhe Gan (@zhegan4) October 12, 2023

Put simply, the model can focus on a specific area marked within an image, recognize useful elements relevant to a user’s query, pinpoint them, and outline them with a bounding box. It can then integrate this recognized element into a query, responding accordingly in a conventional manner.

For instance, if you highlight an animal in an image and ask the LLM what it is, Ferret could discern the species of the creature and that you’re singling out one among many. Leveraging the context of other items detected in the image, it could provide additional related information.

What People Discuss About?

Reddit’s r/Apple noticed that Ferret underwent training utilizing 8 A100 GPUs, each equipped with 80GB of memory.
news.ycombinator
forums.appleinsider

Some discussions revolve around Apple’s artificial intelligence capabilities, their potential in the consumer market, and the limitations of their current ML products (such as Siri and predictive text). There’s also talk about the limitations of CoreML and the potential for MLX to fill this gap.

Virtual Reality helps healthcare students experience clinical scenarios

Sora OpenAI：Breathtaking, yet terrifying

AI-Generated Images: Blurring Reality

Google’s Gemini: Bringing an AI Revolution to Developers and Businesses

Create Social Media Video/Content with AI Tools- Guide Collection

Apple Vision Pro headset launches in U.S. on Feb. 2

Apple releases a new open source machine learning model: “Ferret”

What is Ferret?

What People Discuss About?

Leave a Reply Cancel reply

What is Ferret?

What People Discuss About?

Leave a Reply Cancel reply

Related News