paint-brush
A Summarize-then-Search Method for Long Video Question Answering: Related Workby@kinetograph

A Summarize-then-Search Method for Long Video Question Answering: Related Work

by Kinetograph: The Video Editing Technology Publication
Kinetograph: The Video Editing Technology Publication HackerNoon profile picture

Kinetograph: The Video Editing Technology Publication

@kinetograph

The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover...

May 26th, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story

Too Long; Didn't Read

In this paper, researchers explore zero-shot video QA using GPT-3, outperforming supervised models, leveraging narrative summaries and visual matching.
featured image - A Summarize-then-Search Method for Long Video Question Answering: Related Work
1x
Read by Dr. One voice-avatar

Listen to this story

Kinetograph: The Video Editing Technology Publication HackerNoon profile picture
Kinetograph: The Video Editing Technology Publication

Kinetograph: The Video Editing Technology Publication

@kinetograph

The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.

Learn More
LEARN MORE ABOUT @KINETOGRAPH'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jiwan Chung, MIR Lab Yonsei University (https://jiwanchung.github.io/);

(2) Youngjae Yu, MIR Lab Yonsei University (https://jiwanchung.github.io/).

Movie Summarization Movies are typical examples of long videos with clear narrative structures. Gorinski et al. [7]generate the shorter version of a screenplay as the task of finding an optimal graph chain of a movie scene. TRIPOD [23] is a screenplay dataset containing turning point annotations. In the same work, an automatic model to identify the turning point from movie narratives is proposed. Papalampidi et al. [24] later uses the TV series CSI to demonstrate the usefulness of turning points in automatic movie summarization. Lee et al. [15] further improves turning point identification with dialogue features and transformer architecture.


Long Video QA The task of video question answering has been studied extensively in the literature in the form of both Open-Ended QA [9] and Multi-Choice Problems [28, 29]. Several approaches have been proposed to address this task, starting from RNN-based attention networks [9, 30, 36, 38], to memory networks [12, 22, 27], and transformers [4, 6]. Recently, multimodal models pre-trained on large-scale video datasets (VideoQA [31], VIOLET [5], and MERLOT [33] and MERLOT-Reserve [34]) shows promising performance in video question answering as well.


However, long video QA has received relatively less attention despite its importance. MovieQA [27] formulates QAs on the entire movies, which typically span two long hours. DramaQA [3] uses a single TV series as visual context, and tasks a solver to understand video clips of length from one to twenty minutes.

L O A D I N G
. . . comments & more!

About Author

Kinetograph: The Video Editing Technology Publication HackerNoon profile picture
Kinetograph: The Video Editing Technology Publication@kinetograph
The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X REMOVE AD