CLIP
未分类openai
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
28.7k
Stars
3.6k
Forks
229
Issues
1.8k
Contributors
326
Watchers
deep-learningmachine-learning
Jupyter Notebook
{"name":"MIT License","spdxId":"MIT"}
Project Description
A neural network based on multimodal (image, text) contrastive training. It can predict the most relevant text snippets using natural language given an image, without the need for optimization for specific tasks. CLIP is designed similarly to GPT-2 and GPT-3, boasting excellent zero-shot capabilities and can be applied to a variety of multimodal tasks.