openai

CLIP

未分类

openai

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

29.6k
Stars
3.7k
Forks
236
Issues
2.0k
Contributors
327
Watchers
deep-learningmachine-learning
Jupyter Notebook
{"name":"MIT License","spdxId":"MIT"}

Project Description

A neural network based on multimodal (image, text) contrastive training. It can predict the most relevant text snippets using natural language given an image, without the need for optimization for specific tasks. CLIP is designed similarly to GPT-2 and GPT-3, boasting excellent zero-shot capabilities and can be applied to a variety of multimodal tasks.

© 2025 GitHub Fun. All rights reserved.