[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
22.3k
Stars
2.5k
Forks
1.1k
Issues
47
Contributors
158
Watchers
gpt-4chatbotchatgptllamamultimodalllavafoundation-modelsinstruction-tuningmulti-modalityvisual-language-learningllama-2llama2vision-language-model
Python
{"name":"Apache License 2.0","spdxId":"Apache-2.0"}
Project Description
An assistant built for multi-modal GPT-4 level capabilities. It combines natural language processing and computer vision to provide users with powerful multi-modal interaction and understanding. LLaVA aims to better understand and process language and visual information, thus enabling more complex tasks and conversations. This project represents the direction of development for next-generation intelligent assistants, which can better understand and meet user needs.