is aimed to make use of valuable knowledge in a to help model performance in a Transfer learning source domain target domain. Why do we need Transfer Learning for NLP? In NLP applications, especially when we do not have large enough datasets for solving a task(called the ), we would like to transfer knowledge from other to avoid overfitting and to improve the performance of T. task T target tasks S Two Scenarios Transferring knowledge to a task but with a different dataset. semantically similar/same -A Large dataset for binary sentiment classification Source task (S) - A small dataset for binary sentiment classification Target task (T) Transferring knowledge to a task that is but shares the same neural network architecture so that neural parameters can be transferred. semantically different - A large dataset for binary sentiment classification Source task (S) - A small dataset for 6-way question classification (e.g., location, time, and number) Target task (T) Transfer Methods . Parameter initialization (INIT) The INIT approach first , and then directly uses the tuned parameters to . After transfer, we may fix the parameters in the target domain.i e trains the network on S initialize the network for T fine tuning the parameters of T. Multi-task learning (MULT) MULT, on the other hand, simultaneously trains samples in both domains. Multi Task Learning Combination (MULT+INIT) We first pretrain on the source domain S for , and then parameter initialization train S and T simultaneously. Model Performance on MULT and MULT+INIT Parameter initialization (INIT) , Transfer learning of appears to be . semantically equivalent tasks successful There is for no big improvement semantically different tasks. Conclusion The Neural Transfer Learning in NLP depends largely on how the source and target datasets are. similar in semantics Reference https://arxiv.org/abs/1603.06111