paint-brush
Conferencing and The Art of 'Paper Blitzing'by@alvations
1,404 reads
1,404 reads

Conferencing and The Art of 'Paper Blitzing'

by Liling TanJune 8th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This article introduces a mechanism that I personally use to read many papers quickly so that I can catch up with the ever-growing number of papers published in the Natural Language Processing (NLP) field.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Conferencing and The Art of 'Paper Blitzing'
Liling Tan HackerNoon profile picture


There are soooo many academic and industrial papers in the field of machine learning, natural language processing (NLP) and computer systems nowadays. And even in a single conference, it’s overwhelming. In this post, I’ll share one of the ways I do conference paper reading; I like to call it “paper blitz”.


In a paper blitz session, the main goal is to cover all the papers that I will find “interesting” at a really superficial level. And I really mean ALL. The general idea is to understand the “meta-trends” of the conference submissions and group papers under the same trend so that it gives a more holistic view of either the types of problems an approach can solve or the variations of an approach to solve a single problem type.


In a paper blitz session, the main goal is to cover all the papers that I will find “interesting” at a really superficial level. And I really mean ALL.


For each paper, the objective is to identify:


  • Meta-Trend
    • Problem
    • Approach
  • Quality
    • Strength
    • Weakness
  • Usefulness
    • Code and data availability
    • Natural extension
  • How to apply my work/interest =)


Before we continue, here’s a disclaimer. A paper blitz is not a usual paper reading session nor a deep/shallow dive into the papers. We might also mistakenly critic papers since we are doing a really superficial read on the papers. I’ll reiterate that the goal here is to recall how many papers we can cover in a short time and not the precision of how well we understand the papers. Additionally, I’ll recommend that we bookmark the papers that deserve deeper dive as a follow-up to the paper blitz.


A paper blitz is not a usual paper reading session nor a deep/shallow dive to the papers. We might also mistakenly critic papers since we are doing a really superficial read on the papers.


Filter the papers [15-20 mins]

The first act of the paper blitz is to filter out ~50 papers that we want to read in the blitz. This is usually the hardest part and there’s no easy way to do it and in this case. So, we resort to judging the paper by its title. Unsolicited advice to paper authors, make the paper title informative and interesting.


And now, はじめ…


From https://aclanthology.org/events/acl-2022/, I would normally go through every title one by one and copy and paste the title that I find interesting into a notepad. There will definitely be inherent bias to choose papers of your pet topics, famous authors or simply NLP friends you’ve yet to contact from the pre-covid days. Don’t fight the bias and just put them into your list, but if your whole list is made up of NLP friends’ paper, you either have too many friends or cut down on your list and force yourself to go through the anthology again. Iterate the process until your list is made up of ~50 paper.


I would strongly recommend that you resist the urge and not use CTR+F and also not pick up papers only based on a specific topic. The paper blitz would be more like a shopping discovery experience than an e-commerce search experience, think window-shopping scrolling through the e-commerce app and if you are at the conference physically while doing a paper blitz, think window shopping in a brick and mortar mall.


Here’s a pseudo-code for the filtering process of the paper blitz.


def is_numberish(paper_blitz, max_num=50, ish_margin=0.2):
    return max_num * (1-ish_margin) < len(paper_blitz) < max_num * (1+ish_margin)

paper_blitz = []

while is_numberish(num_paper, 50):
    for paper in acl_anthology:
        if select_title(paper):   # Use your own select_title func, as desired.
            paper_blitz.append(paper)


Drumrolls… ここに… ただ!!


For ACL 2022, here’s my personal paper blitz list. You can see there’s definitely a bias to my pet topics:


  • Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
  • The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study
  • Challenges and Strategies in Cross-Cultural NLP
  • Cross-Lingual Phrase Retrieval
  • Early Stopping Based on Unlabeled Samples in Text Classification
  • CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations
  • DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation
  • Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning
  • Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization
  • FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
  • Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
  • Identifying Moments of Change from Longitudinal User Text
  • Is Attention Explanation? An Introduction to the Debate
  • Measuring Fairness of Text Classifiers via Prediction Sensitivity
  • Meta-learning via Language Model In-context Tuning
  • Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability
  • mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
  • QuoteR: A Benchmark of Quote Recommendation for Writing
  • Robust Lottery Tickets for Pre-trained Language Models
  • Sentence-level Privacy for Document Embeddings
  • ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
  • UniTE: Unified Translation Evaluation
  • Universal Conditional Masked Language Pre-training for Neural Machine Translation
  • What Makes Reading Comprehension Questions Difficult?
  • Word Order Does Matter and Shuffled Language Models Know It
  • Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings
  • Machine Translation for Livonian: Catering to 20 Speakers
  • Kronecker Decomposition for GPT Compression
  • HYPHEN: Hyperbolic Hawkes Attention For Text Streams
  • As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning
  • QiuNiu: A Chinese Lyrics Generation System with Passage-Level Input
  • Language Diversity: Visible to Humans, Exploitable by Machines
  • A Natural Diet: Towards Improving Naturalness of Machine Translation Output
  • Automatic Song Translation for Tonal Languages
  • Dict-BERT: Enhancing Language Model Pre-training with Dictionary
  • ELLE: Efficient Lifelong Pre-training for Emerging Data
  • Finding the Dominant Winning Ticket in Pre-Trained Language Models
  • Mukayese: Turkish NLP Strikes Back
  • Rethinking Document-level Neural Machine Translation
  • First the Worst: Finding Better Gender Translations During Beam Search
  • Word-level Perturbation Considering Word Length and Compositional Subwords
  • Unsupervised Preference-Aware Language Identification
  • Are Prompt-based Models Clueless?
  • BERT Learns to Teach: Knowledge Distillation with Meta Learning
  • bert2BERT: Towards Reusable Pretrained Language Models
  • Better Language Model with Hypernym Class Prediction

Categorize the papers [25-30 mins]

The filtered list still looks like a mental overload but that is the goal of the paper blitz, to cover as many as possible. The next step in the process is to categorize the papers, and how I usually do it is to first put a category on the first paper and then see if the second paper fits into the same category, if not create a second category, then iterate till the end of list, repeat the categorization process from the first and see if the papers need to be reshuffled across the categories, recur until desired.


Here’s another psuedo-code for the categorization process:


cat_to_paper = {}  # Categories to Papers mapping.

def categorize(paper):
    max_sim = 0   # Variable to keep the maximum similarity
    paper_category = None
    for cat in cat_to_paper:
        # cosine() and vectorize() are just illustrations proxy to our human brains.
        # Cosine is a similarity function, proxy to how our brain relate things.
        # Vectorize is a function to convert a paper into an abstract numerical vector.
        similarity = cosine(vectorize(paper), vectorize(cat))
        if similarity > max_sim:
            max_sim = similarity
            paper_category = cat
    return paper_category

# There's no fix satisfaction criteria, you'll have to come up with your own.
while is_satisfied(cat_to_paper):  
    for paper in paper_blitz:
        cat = categorize(paper)
        cat_to_paper[cat] = paper


Presto! ほら!

Here’s a categorized version of my filtered list from ACL 2022:

Multi-Word Expression / Compositionality

  • Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation
  • The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study
  • Word-level Perturbation Considering Word Length and Compositional Subwords

Low-Resource Language / Problems

  • Challenges and Strategies in Cross-Cultural NLP
  • Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
  • Machine Translation for Livonian: Catering to 20 Speakers
  • Language Diversity: Visible to Humans, Exploitable by Machines
  • ELLE: Efficient Lifelong Pre-training for Emerging Data

Multilinguality / Crosslingual NLP

  • Cross-Lingual Phrase Retrieval
  • Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability
  • mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

Unsupervised / Semi-Supervised

  • Early Stopping Based on Unlabeled Samples in Text Classification
  • Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning
  • Unsupervised Preference-Aware Language Identification

Datasets

  • CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations
  • DiBiMT: A Novel Benchmark for Measuring WSD Biases in Machine Translation
  • FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
  • QuoteR: A Benchmark of Quote Recommendation for Writing
  • ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Language Learning / Understanding Language

  • Educational Question Generation of Children Storybooks via Question Type Distribution Learning and Event-centric Summarization
  • What Makes Reading Comprehension Questions Difficult?
  • Word Order Does Matter and Shuffled Language Models Know It
  • Are Prompt-based Models Clueless?

Machine Translation (MT) Tricks / MT Evaluation

  • UniTE: Unified Translation Evaluation
  • As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations
  • A Natural Diet: Towards Improving Naturalness of Machine Translation Output
  • First the Worst: Finding Better Gender Translations During Beam Search

Applications

  • QiuNiu: A Chinese Lyrics Generation System with Passage-Level Input
  • Automatic Song Translation for Tonal Languages
  • Mukayese: Turkish NLP Strikes Back

Model Architectures / Optimization

  • Is Attention Explanation? An Introduction to the Debate
  • Meta-learning via Language Model In-context Tuning
  • Robust Lottery Tickets for Pre-trained Language Models
  • Finding the Dominant Winning Ticket in Pre-Trained Language Models
  • Kronecker Decomposition for GPT Compression
  • HYPHEN: Hyperbolic Hawkes Attention For Text Streams
  • BERT Learns to Teach: Knowledge Distillation with Meta Learning
  • bert2BERT: Towards Reusable Pretrained Language Models
  • Rethinking Document-level Neural Machine Translation

Misc

  • [NUT] Identifying Moments of Change from Longitudinal User Text
  • [Bias] Measuring Fairness of Text Classifiers via Prediction Sensitivity
  • [Privacy] Sentence-level Privacy for Document Embeddings
  • [Semantics] Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings
  • [Language Modelling (LM) Tricks]
    • Universal Conditional Masked Language Pre-training for Neural Machine Translation
    • Better Language Model with Hypernym Class Prediction

The Actual Paper Reading!!

Before the actual reading let’s do some backward time management, because we have


  • Limited time (I’ll usually want to spend no more than 3-4 hours on a single blitz)
  • Finite brainpower
  • Only so much caffeine we can take in a day


Given a 4 hours blitz, we have ~1 hour on the filter and categorize process, we get 360 mins left for 50 papers. Thus, we have around 7 mins per paper.

But there must be a better way than to use 7 mins for each paper! No?

Yes, there is. We can make use of the paper categories to give us a little more leeway when we blitz through the paper. For example, we have 3 papers in the Multi-Word Expression / Compositionality topic, so we get 21 minutes.


Since they are the same topic, the time taken to read the Related Work or Previous Work sections can be collapsed since we don’t need to take much effort into reading that section in the second and third papers.


Since they are the same topic, the time read the “Related Work” or “Previous Work” sections can be collapsed since we don’t need to take much effort in reading that section in the second and third paper.


Most probably, you can also score bonus time if the approach the papers used are similar or they use the same “Transformer is All You Need” shiny hammer 🔨 .


Lets start with the “Multi-Words and Compositionality” Topic

First note when opening the pdf, the same author wrote the first two papers in the topic!


Starting with the “Can transformer be too compositional?” paper, first thing that you will notice is a common gestalt of an NLP paper from ACL, the existence of a figure or example on the top right in the second column on the first page.


Legend says, this top right column figure/example gestalt is commonly attributed to Percy Liang


Legend says, this top right column figure/example gestalt is commonly attributed to Percy Liang but I’m not too sure it’s true for every paper he authored or co-authored. For example:


And here’s the “summary” of the meta-trend for the three papers in this topic:


Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation.

Verna Dankers, Christopher Lucas and Ivan Titov.


  • Meta-Trend

    • Problem: Is non-compositionality inherent in transformer’s architecture?
    • Approach:
      • Probe the attention weights between the MWEs, across the MWE translations
      • Perform canonical correlation analysis (CCA)
        • Note: I’m not exactly sure what this CCA does but it looks like some technique to test similarity between sentence with masked MWE vs sentence with the actual MWE
      • Train probing classifier to predict if figurativeness can be easily predicted by a classifier
        • Note: Seems like one of those eXplainable AI (XAI) approach
  • Quality

    • Woot! (Strength)
      • Nice work on the different transformer architectures, poking to find out about a linguistic phenomenon
      • This paper’s “finding agrees with results from Zaninello and Birch (2020), who ascertain that encoding an idiom as one word improves translations
    • Meh… (Weakness)
      • Analysis was done on translation from English → European languages, the assumption that idiomatic information “density”/salience is equivalence across the target languages
      • Supposedly the analysis of the results should appear in https://github.com/vernadankers/mt_idioms but link 404
  • Usefulness

    • Code and data availability:
    • Natural extension
      • Do the same on CJK / African languages
      • Try reversing the directionality from X → English


  • Time check: 12 mins


The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study.

Verna Dankers, Elia Bruni and Dieuwke Hupkes.


  • Meta-Trend
    • Problem: How to view compositionality more holistically for MT?
    • Approach:
      • [Local compositionality] Compare POS templates of synthetic data, semi-natural vs natural translation data
      • [Global compositionality] Analyse if the words in the full idiom phrase are literally translated or not
        • Note: I think this part of the analysis is manually annotated but I might be mistaken
  • Quality
    • Woot!
      • Identified that simplistic probing on compositionality with synthetic dataset (e.g. English idiom dictionary → European language translation) is too local to the MWE/idiom
      • Their “results indicate that models often do not behave compositionally under the local interpretation, but exhibit behaviour that is too local in other cases. In other words, models have the ability to process phrases both locally and globally but do not always correctly modulate between them.
    • Meh…
      • The global analysis was based on a small subset of English idioms → Dutch translations
      • The discussion section is a little hand-wavy and more like opinions than analysis but it’s a good thing, it makes readers think a little on whether the claims are intuitive or not.
  • Usefulness
  • Time check: 8 mins


Word-level Perturbation Considering Word Length and Compositional Subwords.

Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki

  • Meta-Trend
    • Problem: Better regularlization through subword perturbation
      • Note: Perturbation is just a big word for “disturbing”
    • Approach:
      • We “disturb” the subwords by replacing it with a random word or a random phrase (aka compositional subwords)
      • Instead of choosing word based on corpus’ average length, have a smarter way to sample the word to replace the subword, in the paper they use some poisson distribution using the length of the original word that we want to replace as the signal to the poisson distribution
        • Note: Why poisson? Are we trying to say that a “disturbance” or random word substitution is a “rare event” that usually we estimate using a Poisson distribution?
      • “[Word Replacement] WR often samples words unrelated to the target word owing to the uniform distribution QV . To address this problem, we propose [Compositional Word Replacement] CWR that restricts the source of sampling V to S_xi , which consists of two subsets: Substrings and Overlapped Subwords
        • Note: The description of the CWR (and Algorithm 1) is kinda of terse and I’m going to skip it and somehow remember that they did some comparison of substrings and overlapping subwords and somehow choose a word to replace based on those signals
      • Results tested on sentiment analysis (various datasets) + machine translation (IWSLT)
  • Quality
    • Strength
      • Nice results, the proposed approach gave the model a little boost
      • The collected sentiment analysis data set that is very useful!!
    • Weakness
      • Paper should have better examples and clearer explanation of how the CWR was carried out
        • Note: Maybe cos I’m in a blitz and I really don’t want to dive into the details
      • The motivation to choose Poisson should really be explained, it’s logical choice if we are assuming that these random perturbations are “rare events” but there should be more to the story of why the authors chose that distribution
      • Orthogonality of stacking the approaches isn’t well explored
  • Usefulness
    • Code and data availability
      • 100,000 English Twitter tweets <https://www.kaggle.com/c/ twitter-sentiment-analysis>
      • 671,052 Chinese Weibo samples <https://github.com/wansho/senti-weibo >
      • 352,554 Japanese Twitter tweets <http://www.db.info.gifu-u.ac.jp/data/ Data_5d832973308d57446583ed9f> (not sure if this is the right link??)
      • 120,000 English Amazon reviews (not sure how to download??)
      • 525,000 Japanese Rakuten reviews (not sure how to download??)
      • 390,000 Chinese JD.com reviews (not sure how to download??)
  • Time Check: 12 mins



Intermission…

Let’s do a quick round-up on the first topic on compositionality and see how we did in the blitz, we took 12 + 8 mins for the first two papers that are related and a little longer (12 mins) for the third paper since it was sorta in a different track, it’s more likely to be classified under LM Tricks rather than MWE and Compositionality but that’s the result of “judging the paper by its title” =)


Up till now, we spend a total of 32 mins for 3 papers, it’s a little over-time from our 21 minutes estimate but possibly we’ve saved future time in reading the related works for the LM Tricks topic. I hope the exercise up till this point gives you some tips on how researchers blitz through multiple papers in a short time and how reading more papers grouped in some manner help save time.

Ready for another topic to blitz through? 行くぞ…

Alright, alright, alright! I’ll spare you the torment =)


You definitely don’t want to spend 3-4 hours reading through my summary and I would suggest you take the time to do your own paper blitz too. But in actual fact, I did sit down for another 5+ hours to complete my blitz but my “meta-trend” notes are not as clean as the ones presented here. TBH, I did kinda overtime but considering spending around 6 hours in total to read through ~50 papers, I think I deserve to chill out with a cold サッポロビール to end my night.

Summary

This article introduces a mechanism that I personally use to read many papers quickly so that I can catch up with the ever-growing number of papers published in the Natural Language Processing (NLP) field.


My personal joy in reading papers is always thinking about what new nuggets of knowledge I can get from just a handful of papers and somehow figuring out how to try these approaches in my work projects. And blitzing through papers has helped me identify these nuggets faster than painfully doing shallow or deep dives popular papers or a handful of papers.


I hope that paper blitz can help newcomers in the field bootstrap knowledge of the field and “state-of-the-art” trends and also seasoned researchers save some time in finding the interesting nuggets in a haystack of accepted publications.

Until next time, have fun blitzing through academic papers, がんばれ!

P/S: I would not recommend doing this regularly and I usually only do it before the conference starts or post-conference as soon as the conference ends, at most 2-3 times a year. I usually only do 1-3 paper blitz a year, sometimes *ACL conference, sometimes purely on WMT and MT workshops papers, sometimes my filtering sets starts with a whole mass of “googled” results.