Authors:
(1) Dinesh Kumar Vishwakarma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India;
(2) Mayank Jindal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India
(3) Ayush Mittal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India
(4) Aditya Sharma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India. Table of Links Abstract and Intro
Background and Related Work
EMTD Dataset
Proposed Methodology
Experiments
Conclusion and References 6. Conclusion This work extends the idea of a novel holistic approach to the movie genre classification problem that includes affective and cognitive levels by considering multiple modalities, including situation from the frame, dialogues from speech, and meta-data (movie plot and description). We also built a Hollywood English movie trailers dataset EMTD that includes around 2000 trailers from 5 genres, namely action, comedy, horror, romance, science fiction, to pursue this study. We experimented with various model architectures as discussed in Section 5.2 and also validated our final framework on EMTD and on standard LMTD-9 [2] that achieves AU (PRC) values of 0.92 and 0.82 respectively. Our study's main aim is to build a robust framework to classify a movie genre from its short clip i.e., trailer. Although our study includes English speech as a feature, it can also be applied to some Non-English trailers. For Non-English ones, our model can incorporate the video features only, so on the basis of that, predictions can be made by our architecture. For extension of our proposed model, background audio studies based on vocals can also be incorporated. Hence, in the future, we plan to build a framework considering background vocals in audio along with our current framework to better extract and use most features from movie trailers. We also plan to add some more genres to our study for multi-label classification. 7. References [1] A. Hanjalic and L. Q. Xu, “Affective video content representation and modeling,” IEEE Trans. Multimed., vol. 7, no. 1, 2005. [2] J. Wehrmann and R. C. Barros, “Convolutions through time for multi-label movie genre classification,” in Proceedings of the ACM Symposium on Applied Computing, 2017, vol. Part F1280, pp. 114–119. [3] Z. Rasheed, Y. Sheikh, and M. Shah, “On the use of computable features for film classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 52–64, Jan. 2005. [4] L. H. Chen, Y. C. Lai, and H. Y. Mark Liao, “Movie scene segmentation using background information,” Pattern Recognit., vol. 41, no. 3, 2008. [5] S. K. Jain and R. S. Jadon, “Movies genres classifier using neural network,” 2009. [6] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, 2013. [7] M. Xu, C. Xu, X. He, J. S. Jin, S. Luo, and Y. Rui, “Hierarchical affective content analysis in arousal and valence dimensions,” Signal Processing, vol. 93, no. 8, 2013. [8] A. Yadav and D. K. Vishwakarma, “A unified framework of deep networks for genre classification using movie trailer,” Appl. Soft Comput. J., vol. 96, 2020. [9] K. Choroś, “Video genre classification based on length analysis of temporally aggregated video shots,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11056 LNAI, pp. 509–518. [10] A. M. Ertugrul and P. Karagoz, “Movie Genre Classification from Plot Summaries Using Bidirectional LSTM,” in Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018, 2018, vol. 2018-January. [11] G. Païs, P. Lambert, D. Beauchêne, F. Deloule, and B. Ionescu, “Animated movie genre detection using symbolic fusion of text and image descriptors,” 2012. [12] A. Shahin and A. Krzyżak, “Genre-ous: The Movie Genre Detector,” in Communications in Computer and Information Science, 2020, vol. 1178 CCIS. [13] N. Kumar, A. Harikrishnan, and R. Sridhar, “Hash Vectorizer Based Movie Genre Identification,” in Lecture Notes in Electrical Engineering, 2020, vol. 605. [14] P. G. Shambharkar, P. Thakur, S. Imadoddin, S. Chauhan, and M. N. Doja, “Genre Classification of Movie Trailers using 3D Convolutional Neural Networks,” 2020. [15] W. T. Chu and H. J. Guo, “Movie genre classification based on poster images with deep neural networks,” 2017. [16] G. S. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz, “Movie genre classification with Convolutional Neural Networks,” in Proceedings of the International Joint Conference on Neural Networks, 2016, vol. 2016-October. [17] J. Li, L. Deng, R. Haeb-Umbach, and Y. Gong, “Chapter 2 - Fundamentals of speech recognition,” in Robust Automatic Speech Recognition, J. Li, L. Deng, R. HaebUmbach, and Y. Gong, Eds. Oxford: Academic Press, 2016, pp. 9–40. [18] S. Pratt, M. Yatskar, L. Weihs, A. Farhadi, and A. Kembhavi, “Grounded Situation Recognition,” in Computer Vision -- ECCV 2020, 2020, pp. 314–332. [19] B. Beel, Joeran and Langer, Stefan and Gipp, “TF-IDuF: A Novel Term-Weighting Sheme for User Modeling based on Users’ Personal Document Collections,” Proc. iConference 2017, 2017. [20] J. Wehrmann, R. C. Barros, G. S. Simoes, T. S. Paula, and D. D. Ruiz, “(Deep) Learning from Frames,” 2017. [21] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 2015. [22] E. Fish, A. Gilbert, and J. Weinbren, “Rethinking movie genre classification with finegrained semantic clustering,” arXiv Prepr. arXiv2012.02639, 2020. [23] F. Álvarez, F. Sánchez, G. Hernández-Peñaloza, D. Jiménez, J. M. Menéndez, and G. Cisneros, “On the influence of low-level visual features in film classification,” PLoS One, vol. 14, no. 2, 2019. [24] J. Wehrmann, M. A. Lopes, and R. C. Barros, “Self-attention for synopsis-based multilabel movie genre classification,” 2018. [25] J. Wehrmann and R. C. Barros, “Movie genre classification: A multi-label approach based on convolutions through time,” Appl. Soft Comput. J., vol. 61, 2017. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Dinesh Kumar Vishwakarma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India; (2) Mayank Jindal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India (3) Ayush Mittal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India (4) Aditya Sharma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India. Authors: Authors: (1) Dinesh Kumar Vishwakarma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India; (2) Mayank Jindal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India (3) Ayush Mittal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India (4) Aditya Sharma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India. Table of Links Abstract and Intro Background and Related Work EMTD Dataset Proposed Methodology Experiments Conclusion and References Abstract and Intro Abstract and Intro Background and Related Work Background and Related Work EMTD Dataset EMTD Dataset Proposed Methodology Proposed Methodology Experiments Experiments Conclusion and References Conclusion and References 6. Conclusion This work extends the idea of a novel holistic approach to the movie genre classification problem that includes affective and cognitive levels by considering multiple modalities, including situation from the frame, dialogues from speech, and meta-data (movie plot and description). We also built a Hollywood English movie trailers dataset EMTD that includes around 2000 trailers from 5 genres, namely action, comedy, horror, romance, science fiction, to pursue this study. We experimented with various model architectures as discussed in Section 5.2 and also validated our final framework on EMTD and on standard LMTD-9 [2] that achieves AU (PRC) values of 0.92 and 0.82 respectively. Our study's main aim is to build a robust framework to classify a movie genre from its short clip i.e., trailer. Although our study includes English speech as a feature, it can also be applied to some Non-English trailers. For Non-English ones, our model can incorporate the video features only, so on the basis of that, predictions can be made by our architecture. For extension of our proposed model, background audio studies based on vocals can also be incorporated. Hence, in the future, we plan to build a framework considering background vocals in audio along with our current framework to better extract and use most features from movie trailers. We also plan to add some more genres to our study for multi-label classification. 7. References [1] A. Hanjalic and L. Q. Xu, “Affective video content representation and modeling,” IEEE Trans. Multimed., vol. 7, no. 1, 2005. [2] J. Wehrmann and R. C. Barros, “Convolutions through time for multi-label movie genre classification,” in Proceedings of the ACM Symposium on Applied Computing, 2017, vol. Part F1280, pp. 114–119. [3] Z. Rasheed, Y. Sheikh, and M. Shah, “On the use of computable features for film classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 52–64, Jan. 2005. [4] L. H. Chen, Y. C. Lai, and H. Y. Mark Liao, “Movie scene segmentation using background information,” Pattern Recognit., vol. 41, no. 3, 2008. [5] S. K. Jain and R. S. Jadon, “Movies genres classifier using neural network,” 2009. [6] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, 2013. [7] M. Xu, C. Xu, X. He, J. S. Jin, S. Luo, and Y. Rui, “Hierarchical affective content analysis in arousal and valence dimensions,” Signal Processing, vol. 93, no. 8, 2013. [8] A. Yadav and D. K. Vishwakarma, “A unified framework of deep networks for genre classification using movie trailer,” Appl. Soft Comput. J., vol. 96, 2020. [9] K. Choroś, “Video genre classification based on length analysis of temporally aggregated video shots,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11056 LNAI, pp. 509–518. [10] A. M. Ertugrul and P. Karagoz, “Movie Genre Classification from Plot Summaries Using Bidirectional LSTM,” in Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018, 2018, vol. 2018-January. [11] G. Païs, P. Lambert, D. Beauchêne, F. Deloule, and B. Ionescu, “Animated movie genre detection using symbolic fusion of text and image descriptors,” 2012. [12] A. Shahin and A. Krzyżak, “Genre-ous: The Movie Genre Detector,” in Communications in Computer and Information Science, 2020, vol. 1178 CCIS. [13] N. Kumar, A. Harikrishnan, and R. Sridhar, “Hash Vectorizer Based Movie Genre Identification,” in Lecture Notes in Electrical Engineering, 2020, vol. 605. [14] P. G. Shambharkar, P. Thakur, S. Imadoddin, S. Chauhan, and M. N. Doja, “Genre Classification of Movie Trailers using 3D Convolutional Neural Networks,” 2020. [15] W. T. Chu and H. J. Guo, “Movie genre classification based on poster images with deep neural networks,” 2017. [16] G. S. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz, “Movie genre classification with Convolutional Neural Networks,” in Proceedings of the International Joint Conference on Neural Networks, 2016, vol. 2016-October. [17] J. Li, L. Deng, R. Haeb-Umbach, and Y. Gong, “Chapter 2 - Fundamentals of speech recognition,” in Robust Automatic Speech Recognition, J. Li, L. Deng, R. HaebUmbach, and Y. Gong, Eds. Oxford: Academic Press, 2016, pp. 9–40. [18] S. Pratt, M. Yatskar, L. Weihs, A. Farhadi, and A. Kembhavi, “Grounded Situation Recognition,” in Computer Vision -- ECCV 2020, 2020, pp. 314–332. [19] B. Beel, Joeran and Langer, Stefan and Gipp, “TF-IDuF: A Novel Term-Weighting Sheme for User Modeling based on Users’ Personal Document Collections,” Proc. iConference 2017, 2017. [20] J. Wehrmann, R. C. Barros, G. S. Simoes, T. S. Paula, and D. D. Ruiz, “(Deep) Learning from Frames,” 2017. [21] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 2015. [22] E. Fish, A. Gilbert, and J. Weinbren, “Rethinking movie genre classification with finegrained semantic clustering,” arXiv Prepr. arXiv2012.02639, 2020. [23] F. Álvarez, F. Sánchez, G. Hernández-Peñaloza, D. Jiménez, J. M. Menéndez, and G. Cisneros, “On the influence of low-level visual features in film classification,” PLoS One, vol. 14, no. 2, 2019. [24] J. Wehrmann, M. A. Lopes, and R. C. Barros, “Self-attention for synopsis-based multilabel movie genre classification,” 2018. [25] J. Wehrmann and R. C. Barros, “Movie genre classification: A multi-label approach based on convolutions through time,” Appl. Soft Comput. J., vol. 61, 2017. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Experiments

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Conclusion and References

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Reference List to Learn More About Image Editing, Video Editing, and Diffusion Models

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Experiments

Multilevel Profiling of Situation and Dialogue-based Deep Networks: EMTD Dataset

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Abstract and Intro

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Background and Related Work

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Proposed Methodology

A Reference List to Learn More About Image Editing, Video Editing, and Diffusion Models

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Experiments

Multilevel Profiling of Situation and Dialogue-based Deep Networks: EMTD Dataset

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Abstract and Intro

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Background and Related Work

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Proposed Methodology

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps