paint-brush
Multilevel Profiling of Situation and Dialogue-based Deep Networks: Conclusion and Referencesby@kinetograph

Multilevel Profiling of Situation and Dialogue-based Deep Networks: Conclusion and References

Too Long; Didn't Read

In this paper, researchers propose a multi-modality framework for movie genre classification, utilizing situation, dialogue, and metadata features.
featured image - Multilevel Profiling of Situation and Dialogue-based Deep Networks: Conclusion and References
Kinetograph: The Video Editing Technology Publication HackerNoon profile picture

Authors:

(1) Dinesh Kumar Vishwakarma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India;

(2) Mayank Jindal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India

(3) Ayush Mittal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India

(4) Aditya Sharma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India.

6. Conclusion

This work extends the idea of a novel holistic approach to the movie genre classification problem that includes affective and cognitive levels by considering multiple modalities, including situation from the frame, dialogues from speech, and meta-data (movie plot and description). We also built a Hollywood English movie trailers dataset EMTD that includes around 2000 trailers from 5 genres, namely action, comedy, horror, romance, science fiction, to pursue this study. We experimented with various model architectures as discussed in Section 5.2 and also validated our final framework on EMTD and on standard LMTD-9 [2] that achieves AU (PRC) values of 0.92 and 0.82 respectively. Our study's main aim is to build a robust framework to classify a movie genre from its short clip i.e., trailer. Although our study includes English speech as a feature, it can also be applied to some Non-English trailers. For Non-English ones, our model can incorporate the video features only, so on the basis of that, predictions can be made by our architecture.


For extension of our proposed model, background audio studies based on vocals can also be incorporated. Hence, in the future, we plan to build a framework considering background vocals in audio along with our current framework to better extract and use most features from movie trailers. We also plan to add some more genres to our study for multi-label classification.

7. References

[1] A. Hanjalic and L. Q. Xu, “Affective video content representation and modeling,” IEEE Trans. Multimed., vol. 7, no. 1, 2005.


[2] J. Wehrmann and R. C. Barros, “Convolutions through time for multi-label movie genre classification,” in Proceedings of the ACM Symposium on Applied Computing, 2017, vol. Part F1280, pp. 114–119.


[3] Z. Rasheed, Y. Sheikh, and M. Shah, “On the use of computable features for film classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 52–64, Jan. 2005.


[4] L. H. Chen, Y. C. Lai, and H. Y. Mark Liao, “Movie scene segmentation using background information,” Pattern Recognit., vol. 41, no. 3, 2008.


[5] S. K. Jain and R. S. Jadon, “Movies genres classifier using neural network,” 2009.


[6] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, 2013.


[7] M. Xu, C. Xu, X. He, J. S. Jin, S. Luo, and Y. Rui, “Hierarchical affective content analysis in arousal and valence dimensions,” Signal Processing, vol. 93, no. 8, 2013.


[8] A. Yadav and D. K. Vishwakarma, “A unified framework of deep networks for genre classification using movie trailer,” Appl. Soft Comput. J., vol. 96, 2020.


[9] K. Choroś, “Video genre classification based on length analysis of temporally aggregated video shots,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11056 LNAI, pp. 509–518.


[10] A. M. Ertugrul and P. Karagoz, “Movie Genre Classification from Plot Summaries Using Bidirectional LSTM,” in Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018, 2018, vol. 2018-January.


[11] G. Païs, P. Lambert, D. Beauchêne, F. Deloule, and B. Ionescu, “Animated movie genre detection using symbolic fusion of text and image descriptors,” 2012.


[12] A. Shahin and A. Krzyżak, “Genre-ous: The Movie Genre Detector,” in Communications in Computer and Information Science, 2020, vol. 1178 CCIS.


[13] N. Kumar, A. Harikrishnan, and R. Sridhar, “Hash Vectorizer Based Movie Genre Identification,” in Lecture Notes in Electrical Engineering, 2020, vol. 605.


[14] P. G. Shambharkar, P. Thakur, S. Imadoddin, S. Chauhan, and M. N. Doja, “Genre Classification of Movie Trailers using 3D Convolutional Neural Networks,” 2020.


[15] W. T. Chu and H. J. Guo, “Movie genre classification based on poster images with deep neural networks,” 2017.


[16] G. S. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz, “Movie genre classification with Convolutional Neural Networks,” in Proceedings of the International Joint Conference on Neural Networks, 2016, vol. 2016-October.


[17] J. Li, L. Deng, R. Haeb-Umbach, and Y. Gong, “Chapter 2 - Fundamentals of speech recognition,” in Robust Automatic Speech Recognition, J. Li, L. Deng, R. HaebUmbach, and Y. Gong, Eds. Oxford: Academic Press, 2016, pp. 9–40.


[18] S. Pratt, M. Yatskar, L. Weihs, A. Farhadi, and A. Kembhavi, “Grounded Situation Recognition,” in Computer Vision -- ECCV 2020, 2020, pp. 314–332.


[19] B. Beel, Joeran and Langer, Stefan and Gipp, “TF-IDuF: A Novel Term-Weighting Sheme for User Modeling based on Users’ Personal Document Collections,” Proc. iConference 2017, 2017.


[20] J. Wehrmann, R. C. Barros, G. S. Simoes, T. S. Paula, and D. D. Ruiz, “(Deep) Learning from Frames,” 2017.


[21] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 2015.


[22] E. Fish, A. Gilbert, and J. Weinbren, “Rethinking movie genre classification with finegrained semantic clustering,” arXiv Prepr. arXiv2012.02639, 2020.


[23] F. Álvarez, F. Sánchez, G. Hernández-Peñaloza, D. Jiménez, J. M. Menéndez, and G. Cisneros, “On the influence of low-level visual features in film classification,” PLoS One, vol. 14, no. 2, 2019.


[24] J. Wehrmann, M. A. Lopes, and R. C. Barros, “Self-attention for synopsis-based multilabel movie genre classification,” 2018.


[25] J. Wehrmann and R. C. Barros, “Movie genre classification: A multi-label approach based on convolutions through time,” Appl. Soft Comput. J., vol. 61, 2017.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

Kinetograph: The Video Editing Technology Publication HackerNoon profile picture
Kinetograph: The Video Editing Technology Publication@kinetograph
The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.

TOPICS

Languages