CLIP

未分类

openai

GitHub

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

29.6k

Stars

3.7k

Forks

236

Issues

2.0k

Contributors

327

Watchers

deep-learningmachine-learning

Jupyter Notebook

{"name":"MIT License","spdxId":"MIT"}

Project Description

A neural network based on multimodal (image, text) contrastive training. It can predict the most relevant text snippets using natural language given an image, without the need for optimization for specific tasks. CLIP is designed similarly to GPT-2 and GPT-3, boasting excellent zero-shot capabilities and can be applied to a variety of multimodal tasks.