LLaVA

未分类

haotian-liu

GitHub Website

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

23.8k

Stars

2.6k

Forks

1.1k

Issues

Contributors

158

Watchers

gpt-4chatbotchatgptllamamultimodalllavafoundation-modelsinstruction-tuningmulti-modalityvisual-language-learningllama-2llama2vision-language-model

Python

{"name":"Apache License 2.0","spdxId":"Apache-2.0"}

Project Description

An assistant built for multi-modal GPT-4 level capabilities. It combines natural language processing and computer vision to provide users with powerful multi-modal interaction and understanding. LLaVA aims to better understand and process language and visual information, thus enabling more complex tasks and conversations. This project represents the direction of development for next-generation intelligent assistants, which can better understand and meet user needs.