Kamil Tamiola

@KamilTamiola

Should machine learning and AI advance, proper scientific reporting is a must!

My journey with machine learning started in high school. I was lucky and motivated enough to get my hands on old textbooks about Artificial Neural Networks. It was 2000. Friends were super popular on TV, Eminem was dropping hit after hit and there was I, an utter geek perplexed and literally blown away by object recognition systems, which at that time were scientific curiosities. My first paper, “Seeing and Recognizing Objects as Physical Process — Practical and Theoretical Use of Artificial Neural Networks”, written at the age of 18, was my juvenile attempt at becoming scientist. It won me scholarships and entry to the best universities in UK (Cambridge) and the Netherlands (Groningen), and eventually unlocked academic carrier in computational biophysics of proteins. Finally, I was lucky enough to combine scientific expertise and a love affair with machine learning into an AI venture, Peptone.

However, my academic development path wasn’t rosy nor romantic at all. It was a mix of excitement and and getting my “golden behind” kicked by my mentors and academic supervisors, with the dominant contribution of the latter. The biggest pain of being an “academic wunderkind” was scientific writing. I was seriously repelled by it. I was absolutely convinced I was wasting my time, rather than doing more productive things in the lab.

Boy, was I wrong!

It is quite amusing to see how ideas have changed from the perspective of time and experience. Especially, when you reach a turning point in your carrier and start contributing to the field you admired as a geeky teenager. However, let me cut my hubristic autobiographical note short and jump straight to the problem.

Only few days ago, I have stumbled upon an article in MIT Review, which motivated me to write this short post.

Is AI Riding a One-Trick Pony? Are we making progress or simply circling around in an endless pool of NET(s), optimizes, architectures, normalization methods methods and image recognition approaches?

I am afraid, we are.

Let me share my take on this. Please bare in mind this is my private opinion, born out of numerous hours spent reading machine learning / AI papers, and trying to adopt their findings to the problems we are working with at Peptone, namely automated protein engineering.

  1. The vast number of AI/ML pre-prints are lacking proper citations. The seminal works (e.g. papers that introduce the concept of perceptron or back-propagation) in the domain of AI/ML are cited selectively or not at all. In turn it is very difficult for a newcomer to AI field with more than sufficient math knowledge to place their actual scientific findings in the broader context.
  2. Missing or blatantly incorrect citations lead to excessive relabeling of known and existing scientific concepts, therefore inflating but not propelling the field of machine learning and adding absolutely unnecessary fussiness. I have just reviewed 2 papers in the field of bioinformatics (working as an anonymous referee for peer-reviewed journals) that concern AI, in which authors proclaim development of methods, which date back to 2004 and have at least 600+ citations insofar! How can you miss that?! Furthermore, I have seen authors of “AI-books” comparing perfectly known and well-characterized issues with gradient-optimization methods to Newtonian N-body problems, or devising loss functions that simply depend on inverse square laws (with all its limitations) and claiming it is modeled after “electromagnetic Coulomb law” (sic!). Ladies and gentlemen! Coulomb is rolling in his grave. If you go for Coulomb's law, please talk about electrostatics! Leave electromagnetism to Faraday and Maxwell.
  3. Lack of proper statistical analysis of the results. I personally believe this is the biggest issue of them all. The way the results are presented does not conform to any standards of presentable scientific research. The most prominent problem is reporting accuracy in arbitrary units without simple discussion about statistical relevance of improvement. How relevant is that your network recognized objects with 1% better accuracy? What does it really mean with respect to number of parameters your model uses? How many degrees of freedom are in your model, how does it compare with less complex models? How do you asses you don’t over-fit or simply create degenerative models, which have little to no statistical significance.
  4. Because of the sparse or frequently non-existent statistical analysis of goodness of fit and basic statistical tests that asses model goodness with respect to the number of degrees of freedom many AI/Ml papers are suffering from reproducibility issues, which very quickly translate into end-user questions and reported issues found under many “official” ML model repositories on Github. The “mere mortals” are unable to reproduce the great findings of the “experts”, using the same data and ML network architectures.

Why all of the above are important? All of us need to be able to separate innovation from cyclically recurring patterns of work, which not only slows down the progress in the field of machine learning, but most importantly triggers bizarre levels of anxiety among the public, press, science and tech investors, eventually leading to headlines like this:

Please don’t get it the wrong way. I am absolutely not intending to mock nor spar with Elon Musk or Prof. Hawking, whom I have profound respect for. However, the fact is the bizarre and pseudo-apocalyptic press AI and Machine Learning are getting can be only compared to sensational and completely off the point articles about cancer, which is portrayed as a mythical and viscous creature (almost a sprout of Beelzebub himself) looking to annihilate human kind.

What can we do to improve the ongoing AI / ML research?

Read, read and once again read. If you think you have read enough, write and after that read again.

One of my academic teachers, Prof. Ben Ferninga of University of Groningen (who eventually got Nobel prize in 2016 for the discovery of organic nano-machines) told me and my fellow Uni. Groningen geeks, you have to be (I cite) “cautiously optimistic in your research”. Cautious optimism and stringent scientific reporting in machine learning and AI fields will lead to easier to asses, implement and regulate AI-driven automation. Eventually, society and press will see AI / ML won’t replace jobs completely, but augment them; boosting productivity and extending well-deserved lunch breaks. Moreover stringent and scientifically objective reporting about the way machine learning methods work and are train to work, should eventually pave the way for more effective legislative routes.

My rant stops here. I strongly recommend the articles below this paragraph, which touch on the issues with AI-research, and the regulatory aspects of applied machine learning.

Please have your say using the comments section. Should the applied AI/ML advance, everybody interested needs to join the conversation. I absolutely don’t believe I am the only person having issues with the way AI/ML papers written and deposited in pre-print repositories.

More by Kamil Tamiola

Topics of interest

More Related Stories