{"id":3310,"date":"2024-08-28T07:17:00","date_gmt":"2024-08-28T07:17:00","guid":{"rendered":"https:\/\/www.radisentech.com\/?post_type=publication&#038;p=3310"},"modified":"2024-08-29T02:08:02","modified_gmt":"2024-08-29T02:08:02","slug":"generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model","status":"publish","type":"publication","link":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/","title":{"rendered":"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model"},"content":{"rendered":"\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<h2 class=\"wp-block-heading\" id=\"h-published\">Published<\/h2>\n\n\n\n<p>ECAI&nbsp;(2024)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-authors\">Authors<\/h2>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<p>Taehee Kim<sup>a<\/sup>, Yeongjae Cho<sup>b<\/sup>, Heejun Shin<sup>a<\/sup>, Yohan Jo<sup>b<\/sup> and Dongmyung Shin<sup>a<\/sup><\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-affiliations\">Affiliations<\/h2>\n\n\n\n<p><em><sup>a<\/sup>Radisen Co. Ltd.<br><sup>b<\/sup>Seoul National University<\/em><\/p>\n\n\n\n<p><em><br><\/em><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h2 class=\"wp-block-heading\" id=\"h-abstract\">Abstract<\/h2>\n\n\n\n<p>Visual question answering (VQA) is a task where an image is given, and a series of questions are asked about the image. To build an efficient VQA algorithm, a large amount of QA data is required which is very expensive. Generating synthetic QA pairs based on templates is a practical way to obtain data. However, VQA models trained on those data do not perform well on complex, human-written questions. To address this issue, we propose a new method called chain of QA for human-written questions (CoQAH). CoQAH utilizes a sequence of QA interactions between a large language model and a VQA model trained on synthetic data to reason and derive logical answers for human-written questions. We tested the effectiveness of CoQAH on two types of human-written VQA datasets for 3D-rendered and chest X-ray images and found that it achieved state-of-the-art accuracy in both types of data. Notably, CoQAH outperformed general vision-language models, VQA models, and medical foundation models with no finetuning.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-fill\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/arxiv.org\/abs\/2401.06400\">Link to Publication<\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p><\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"_acf_changed":false},"categories":[],"class_list":["post-3310","publication","type-publication","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.3 (Yoast SEO v22.3) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model - Radisen<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model\" \/>\n<meta property=\"og:description\" content=\"Published ECAI&nbsp;(2024) Authors Taehee Kima, Yeongja [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/\" \/>\n<meta property=\"og:site_name\" content=\"Radisen\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-29T02:08:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/Site-image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/\",\"url\":\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/\",\"name\":\"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model - Radisen\",\"isPartOf\":{\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#website\"},\"datePublished\":\"2024-08-28T07:17:00+00:00\",\"dateModified\":\"2024-08-29T02:08:02+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.radisentech.com\/zh-hans\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#website\",\"url\":\"https:\/\/www.radisentech.com\/zh-hans\/\",\"name\":\"Radisen\",\"description\":\"AI\ub97c \ud1b5\ud55c \uc758\ub8cc \ud601\uc2e0\uc73c\ub85c \uac74\uac15\ud55c \uc0b6\uc758 \uac00\uce58\ub97c \ub192\uc785\ub2c8\ub2e4.\",\"publisher\":{\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.radisentech.com\/zh-hans\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#organization\",\"name\":\"Radisen\",\"url\":\"https:\/\/www.radisentech.com\/zh-hans\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/favicon.png\",\"contentUrl\":\"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/favicon.png\",\"width\":512,\"height\":512,\"caption\":\"Radisen\"},\"image\":{\"@id\":\"https:\/\/www.radisentech.com\/zh-hans\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model - Radisen","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/","og_locale":"zh_CN","og_type":"article","og_title":"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model","og_description":"Published ECAI&nbsp;(2024) Authors Taehee Kima, Yeongja [&hellip;]","og_url":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/","og_site_name":"Radisen","article_modified_time":"2024-08-29T02:08:02+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/Site-image.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"1 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/","url":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/","name":"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model - Radisen","isPartOf":{"@id":"https:\/\/www.radisentech.com\/zh-hans\/#website"},"datePublished":"2024-08-28T07:17:00+00:00","dateModified":"2024-08-29T02:08:02+00:00","breadcrumb":{"@id":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.radisentech.com\/zh-hans\/publication\/generalizing-visual-question-answering-from-synthetic-to-human-written-questions-via-a-chain-of-qa-with-a-large-language-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.radisentech.com\/zh-hans\/"},{"@type":"ListItem","position":2,"name":"Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model"}]},{"@type":"WebSite","@id":"https:\/\/www.radisentech.com\/zh-hans\/#website","url":"https:\/\/www.radisentech.com\/zh-hans\/","name":"Radisen","description":"AI\ub97c \ud1b5\ud55c \uc758\ub8cc \ud601\uc2e0\uc73c\ub85c \uac74\uac15\ud55c \uc0b6\uc758 \uac00\uce58\ub97c \ub192\uc785\ub2c8\ub2e4.","publisher":{"@id":"https:\/\/www.radisentech.com\/zh-hans\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.radisentech.com\/zh-hans\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/www.radisentech.com\/zh-hans\/#organization","name":"Radisen","url":"https:\/\/www.radisentech.com\/zh-hans\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.radisentech.com\/zh-hans\/#\/schema\/logo\/image\/","url":"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/favicon.png","contentUrl":"https:\/\/www.radisentech.com\/wp-content\/uploads\/2024\/06\/favicon.png","width":512,"height":512,"caption":"Radisen"},"image":{"@id":"https:\/\/www.radisentech.com\/zh-hans\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/publication\/3310","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/publication"}],"about":[{"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/types\/publication"}],"version-history":[{"count":11,"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/publication\/3310\/revisions"}],"predecessor-version":[{"id":3335,"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/publication\/3310\/revisions\/3335"}],"wp:attachment":[{"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/media?parent=3310"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.radisentech.com\/zh-hans\/wp-json\/wp\/v2\/categories?post=3310"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}