This story draft by @escholar has not been reviewed by an editor, YET.

What matters when building vision-language models?: Further experimental details of the ablations