110 reads

A Summarize-then-Search Method for Long Video Question Answering: Method

by Kinetograph: The Video Editing Technology PublicationMay 26th, 2024

Too Long; Didn't Read

In this paper, researchers explore zero-shot video QA using GPT-3, outperforming supervised models, leveraging narrative summaries and visual matching.

featured image - A Summarize-then-Search Method for Long Video Question Answering: Method

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jiwan Chung, MIR Lab Yonsei University (https://jiwanchung.github.io/);

(2) Youngjae Yu, MIR Lab Yonsei University (https://jiwanchung.github.io/).

Table of Links

2. Method

2.1. Plot Generation

2.2. Narrative Search

Given the summarized narrative and the question, we wish to retrieve the relatively short clip relevant to the question from the long video. Language models generate open-ended text which is irregular and often noisy. To retrieve the exact part of the video, we drive the model to output indices of the plot rather than the text form.

The generated indices might still be noisy due to the open-ended nature of language models. When the model outputs an answer in text form, we use rouge-l [19] score to find plot piece candidates whose similarity with the generated sentence are above the specified threshold α ≥ 0.5.

2.3. Visual Checking

L O A D I N G
. . . comments & more!

About Author

Kinetograph: The Video Editing Technology Publication@kinetograph

The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.

Read my stories Learn More

TOPICS

science #narrative-video-qa #long-story-short-framework #large-language-models #multimodal-narratives #zero-shot-reasoning #gpt-3 #clipcheck #long-story-short

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas