This story draft by @escholar has not been reviewed by an editor, YET.

Exploring the design space of vision-language models and Are all pre-trained backbones