This story draft by @escholar has not been reviewed by an editor, YET.

How Automated Tools Are Making Open Source Software Safer

featured image - How Automated Tools Are Making Open Source Software Safer
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Background and Related Work

  2. Study of Unethical Behavior in OSS

    3.1 RQ1: Types of unethical behavior

    3.2 RQ2: Affected software artifacts

  3. Methodology

    4.1 Modeling via SWRL rules

    4.2 Automatic detection of unethical behavior

  4. Evaluation

  5. Discussion and Implications

  6. Threats to Validity

  7. Conclusion and References

7 THREATS TO VALIDITY

External. Our findings of unethical behavior may not generalize beyond the studied OSS projects and issues/PRs. There could be unethical behavior that are not reported to the issue tracker. Unfortunately, there is no conceivable way to study these unreported issues. As some issues may not have the ethics-related keywords that we used for searching, we could have also missed some unethical behavior. Nevertheless, our selected keywords already help us in discovering many types of unethical behavior. Hence, we believe the issues in our study provide a representative sample of the reported and resolved unethical issues in our studied repositories. While other types of unethical behavior discovered in our study is important, Etor can only detect six of them, and our evaluation is limited to these six types. Nevertheless, our experiments show that Etor can detect unethical behavior with relatively high accuracy.


Internal. Our code and scripts may have bugs that can affect our results. To mitigate this threat, we make our tool and data publicly available for inspection.

8 CONCLUSION

To better understand unethical behavior in OSS projects, we conduct a study of the types of unethical behavior in OSS projects. By reading and analyzing the discussion of stakeholders in OSS projects, our study of 316 GitHub issues identifies 15 types of unethical behavior. These unethical behaviors are affected by various types of software artifacts. Inspired by our study, we propose Etor, an ontology-based approach that can automatically detect unethical behavior. Our evaluation of Etor on 195,621 issues (1,765 repositories) shows that Etor can automatically detect 548 issues with 74.8% TP rate on average. As the first study that investigates the types of unethical behavior in OSS projects, we hope to raise awareness among OSS stakeholders regarding the importance of understanding ethical issues in OSS projects. While Etor shows promising results in automated detection of unethical behavior in OSS projects, we plan to enhance Etor in future to detect more types and reduce false positives using machine learning techniques.

REFERENCES

[1] [n.d.]. https://github.com/eslint/eslint/pull/15102


[2] [n.d.]. https://www.w3.org/2001/sw/#owl


[3] [n.d.]. http://www.w3.org/Submission/SWRL/


[4] [n.d.]. https://github.com/Pryaxis/handbook/issues/3


[5] [n.d.]. https://github.com/novus-package-manager/novus/issues/3


[6] [n.d.]. https://github.com/biddyweb/yes-cart/issues/33


[7] [n.d.]. https://github.com/CircuitVerse/Interactive-Book/issues/80


[8] [n.d.]. https://github.com/mpdf/mpdf/issues/15


[9] [n.d.]. https://github.com/pkalogiros/AudioMass/issues/1


[10] [n.d.]. https://github.com/minio/minio/issues/12143


[11] [n.d.]. https://github.com/wger-project/wger/issues/266


[12] [n.d.]. https://github.com/tranleduy2000/javaide/issues/236


[13] [n.d.]. https://github.com/flyingsaucerproject/flyingsaucer/pull/123


[14] [n.d.]. https://github.com/click-llc/click-integration-django/issues/1


[15] [n.d.]. https://github.com/twbs/bootstrap/issues/5632


[16] [n.d.]. https://github.com/NetHack/NetHack/issues/359


[17] [n.d.]. https://github.com/EasyEngine/easyengine/issues/488


[18] [n.d.]. https://github.com/katzwebservices/Contact-Form-7-Newsletter/issues/ 79


[19] [n.d.]. https://docs.github.com/en/rest/repos


[20] [n.d.]. https://www.legislation.gov.au/Details/C2017C00180


[21] [n.d.]. https://github.com/manuel-freire/ac2


[22] [n.d.]. https://docs.github.com/en/communities/setting-up-your-project-forhealthy-contributions/adding-a-license-to-a-repository


[23] [n.d.]. https://docs.github.com/en/repositories/managing-your-repositoryssettings-and-features/customizing-your-repository/licensing-a-repository


[24] [n.d.]. https://github.com/PyGithub/PyGithub


[25] [n.d.]. https://github.com/Anarios/return-youtube-dislike/issues/401 [26] [n.d.]. https://docs.github.com/en/repositories/managing-your-repositoryssettings-and-features/enabling-features-for-your-repository/disabling-issues


[27] [n.d.]. https://github.com/rydercalmdown/package_theft_preventor


[28] [n.d.]. https://github.com/EtorChecker/Etor


[29] [n.d.]. ailab. https://github.com/bilibili/ailab


[30] [n.d.]. Are we correctly handling console.Console in node objectKeys(console)? https://github.com/sindresorhus/ts-extras/issues/50


[31] [n.d.]. CUDA vs Naive Speedup? https://github.com/d-li14/involution/issues/1


[32] [n.d.]. DogeBot2. https://github.com/DGXeon/DogeBot2 [33] [n.d.]. Squeeze tooltip in the sections panel. https://github.com/livebook-dev/ livebook/pull/536


[34] [n.d.]. VIP. https://github.com/Oreomeow/VIP


[35] [n.d.]. What is Plagiarism? ([n. d.]). https://www.plagiarism.org/article/what-isplagiarism


[36] 2021. , Report on University of Minnesota Breach-of-Trust Incident pages. https: //lwn.net/ml/linux-kernel/202105051005.49BFABCE@keescook/


[37] Anneliese Amschler Andrews and Arundeep S %J Empirical Software Engineering Pradhan. 2001. Ethical issues in empirical software engineering: the limits of policy. 6, 2 (2001), 105–110.


[38] Grigoris Antoniou and Frank van Harmelen. 2004. Web ontology language: Owl. In Handbook on ontologies. Springer, 67–92.


[39] Deepika Badampudi. [n.d.]. Reporting ethics considerations in software engineering publications. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 205–210.


[40] Sebastian Baltes and Stephan Diehl. 2016. Worse than spam: Issues in sampling software developers. In Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. 1–6.


[41] Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.


[42] Sebastian Baltes, Richard Kiefer, and Stephan Diehl. 2017. Attribution required: Stack overflow code snippets in GitHub projects. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 161–163.


[43] Dizza Beimel and Mor Peleg. 2011. Using OWL and SWRL to represent and reason with situation-based access control policies. Data & Knowledge Engineering 70, 6 (2011), 596–615.


[44] Stephen R Bergerson. 2000. E-commerce Privacy and the Black Hole of Cyberspace. Wm. Mitchell L. Rev. 27 (2000), 1527.


[45] Hanene Boussi Rahmouni, Tony Solomonides, Marco Casassa Mont, and Simon Shiu. 2009. Modelling and enforcing privacy for medical data disclosure across Europe. In Medical Informatics in a United and Healthy Europe. IOS Press, 695–699.


[46] Mark Cenite, Benjamin H Detenber, Andy WK Koh, Alvin LH Lim, Ng Ee %J New Media Soon, and Society. 2009. Doing the right thing online: a survey of bloggers’ ethical beliefs and practices. 11, 4 (2009), 575–597.


[47] Jason A Colquitt. 2001. On the dimensionality of organizational justice: a construct validation of a measure. Journal of applied psychology 86, 3 (2001), 386.


[48] Daniela S Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In 2011 international symposium on empirical software engineering and measurement. IEEE, 275–284.


[49] Daniela America da Silva, Henrique Duarte Borges Louro, Gildarcio Sousa Goncalves, Johnny Cardoso Marques, Luiz Alberto Vieira Dias, Adilson Marques da Cunha, and Paulo Marcelo Tasinaffo. 2021. Could a Conversational AI Identify Offensive Language? Information 12, 10 (2021), 418.


[50] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. 2003. Locating features in source code. IEEE Transactions on software engineering 29, 3 (2003), 210–224.


[51] Batya Friedman, Peter H Kahn, Alan Borning, and Alina Huldtgren. 2013. Value sensitive design and information systems. Springer, 55–95.


[52] Daniel M German, Yuki Manabe, and Katsuro Inoue. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 437–446.


[53] Daniel M German, Gregorio Robles, Germán Poo-Caamaño, Xin Yang, Hajimu Iida, and Katsuro Inoue. 2018. "Was My Contribution Fairly Reviewed?" A Framework to Study the Perception of Fairness in Modern Code Reviews. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 523–534.


[54] Nicolas E Gold and Jens Krinke. [n.d.]. Ethical Mining: A Case Study on MSR Mining Challenges. In Proceedings of the 17th International Conference on Mining Software Repositories. 265–276.


[55] Yaroslav Golubev, Maria Eliseeva, Nikita Povarov, and Timofey Bryksin. 2020. A study of potential code borrowing and license violations in java projects on github. In Proceedings of the 17th International Conference on Mining Software Repositories. 54–64.


[56] Frances S Grodzinsky, Keith Miller, and Marty J Wolf. 2003. Ethical issues in open source software. Journal of Information, Communication and Ethics in Society (2003).


[57] Idris Hsi and Colin Potts. 2000. Studying the Evolution and Enhancement of Software Features.. In icsm. 143.


[58] Syed Fatiul Huq, Ali Zafar Sadiq, and Kazi Sakib. 2019. Understanding the effect of developer sentiment on fix-inducing changes: An exploratory study on github pull requests. In 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 514–521.


[59] Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai, and Emerson Murphy-Hill. [n.d.]. Investigating the effects of gender bias on GitHub. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 700–711.


[60] Georgia M Kapitsaki, Frederik Kramer, and Nikolaos D Tselikas. 2017. Automating the license compatibility process in open source software with SPDX. Journal of systems and software 131 (2017), 386–401.


[61] Georgia M Kapitsaki, Nikolaos D Tselikas, and Ioannis E Foukarakis. 2015. An insight into license tools for open source software systems. Journal of Systems and Software 102 (2015), 72–87.


[62] ASM Kayes, Wenny Rahayu, Tharam Dillon, and Elizabeth Chang. 2018. Accessing data from multiple sources through context-aware access control. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, 551–559.


[63] David Kocsis and Gert-Jan de Vreede. 2016. Towards a taxonomy of ethical considerations in crowdsourcing. (2016).


[64] Josh Lerner and Jean Tirole. 2005. The scope of open source licensing. Journal of Law, Economics, and Organization 21, 1 (2005), 20–56.


[65] Tyler McDonnell, Baishakhi Ray, and Miryung Kim. 2013. An empirical study of api stability and adoption in the android ecosystem. In 2013 IEEE International Conference on Software Maintenance. IEEE, 70–79.


[66] Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C recommendation 10, 10 (2004), 2004.


[67] Stuart McIlroy, Nasir Ali, and Ahmed E Hassan. 2016. Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store. Empirical Software Engineering 21, 3 (2016), 1346–1370.


[68] Andrew McNamara, Justin Smith, and Emerson Murphy-Hill. [n.d.]. Does ACM’s code of ethics change ethical decision making in software development?. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 729–733.


[69] Brent Mittelstadt. 2019. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1 (11 2019). https://doi.org/10.1038/s42256-019-0114-4


[70] Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM conference on hypertext and social media. 85–94.


[71] Mark A Musen. 2015. The protégé project: a look back and a look forward. AI matters 1, 4 (2015), 4–12.


[72] Linus Nyman and Tommi Mikkonen. 2011. To fork or not to fork: Fork motivations in SourceForge projects. International Journal of Open Source Software and Processes (IJOSSP) 3, 3 (2011), 1–9.


[73] Christopher Oezbek et al. 2008. Research ethics for studying Open Source projects. 4th Research Room FOSDEM: Libre software communities meet research community (2008).


[74] Rolf-Helge Pfeiffer. 2020. What constitutes software? An empirical, descriptive study of artifacts. In Proceedings of the 17th International Conference on Mining Software Repositories. 481–491.


[75] Janice Singer and Norman G. %J IEEE Transactions on Software Engineering Vinson. 2002. Ethical issues in empirical studies of software engineering. 28, 12 (2002), 1171–1180.


[76] Josh Terrell, Andrew Kofink, Justin Middleton, Clarissa Rainear, Emerson R Murphy-Hill, and Chris Parnin. 2016. Gender bias in open source: Pull request acceptance of women versus men. PeerJ Prepr. 4 (2016), e1733.


[77] Matteo Turilli and Luciano Floridi. 2009. The ethics of information transparency. Ethics and Information Technology 11, 2 (2009), 105–112.


[78] Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. 2017. Machine learning-based detection of open source license exceptions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 118–129.


[79] Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. [n.d.]. License usage and changes: a large-scale study of java projects on github. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 218–228.


[80] Denny Vrandečić. 2009. Ontology evaluation. In Handbook on ontologies. Springer, 293–313.


[81] Qiushi Wu and Kangjie Lu. 2021. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits. In Proc. Oakland.


[82] Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, and Hua Ji. 2021. LiDetector: License Incompatibility Detection for Open Source Software. ACM Transactions on Software Engineering and Methodology (2021).


[83] Di Yang, Pedro Martins, Vaibhav Saini, and Cristina Lopes. 2017. Stack overflow in github: any snippets there?. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 280–290.


Authors:

(1) Hsu Myat Win, Southern University of Science and Technology, China ([email protected]);

(2) Haibo Wang, Southern University of Science and Technology, China ([email protected]);

(3) Shin Hwei Tan, a corresponding author from Southern University of Science and Technology, China ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks