Project conducted during the King's College Prompting Hackathon
To what extent can LLMs be useful for multi-modal knowledge acquisition and inferencing?
- Prior work on leveraging text-only LLMs for knowledge extraction and KG completion (overview here).
- We would like to extend such approaches to multi-modal knowledge, including not only text and images, but also audio, video, haptics etc.
- The goal would be to test the ability of multi-modal LLMs such as GPT-4 (as well as others) towards the construction and completion of a multi-modal KG in the context of the MuseIT project (https://www.muse-it.eu/).
- Particularly interesting would be to explore the functionality of LLMs for multi-modal reasoning and inferencing.