ImageBind

model, new open-source AI model, на сайте с May 09, 2023 17:57

By aligning six modalities’ embedding into a common space, ImageBind enables cross-modal retrieval of different types of content that aren’t observed together, the addition of embeddings from different modalities to naturally compose their semantics, and audio-to-image generation by using our audio embeddings with a pretrained DALLE-2 decoder to work with CLIP text embeddings. ImageBind is a multimodal model that joins a recent series of Meta's open source AI tools.