**Transfer learning** is aimed to make use of valuable knowledge in a **source domain** to help model performance in a **target domain.** ### Why do we need Transfer Learning for NLP? In NLP applications, especially when we do not have large enough datasets for solving a task(called the **_target_ task T** ), we would like to transfer knowledge from other **tasks S** to avoid overfitting and to improve the performance of T. ### **Two Scenarios** Transferring knowledge to a **semantically similar/same** task but with a different dataset. * **Source task (S)**\-A Large dataset for binary sentiment classification * **Target task (T)**\- A small dataset for binary sentiment classification Transferring knowledge to a task that is **semantically different** but shares the same neural network architecture so that neural parameters can be transferred. * **Source task (S)**\- A large dataset for binary sentiment classification * **Target task (T)** - A small dataset for 6-way question classification (e.g., location, time, and number) ### Transfer Methods #### **Parameter initialization (INIT)**. The INIT approach first **trains the network on S**, and then directly uses the tuned parameters to **initialize the network for T** . After transfer, we may fix the parameters in the target domain.i e **fine tuning the parameters of T.**  #### **Multi-task learning (MULT)** MULT, on the other hand, simultaneously trains **samples in both domains.**  Multi Task Learning #### **Combination (MULT+INIT)** We first pretrain on the source domain S for **parameter initialization**, and then **train S and T simultaneously.** ### **Model Performance on** #### **Parameter initialization (INIT) ,** MULT and MULT+INIT * Transfer learning of **semantically equivalent tasks** appears to be **successful**. * There is **no big improvement** for **semantically different tasks.** ### **Conclusion** The Neural Transfer Learning in NLP depends largely on how **similar in semantics** the source and target datasets are. ### **Reference** [https://arxiv.org/abs/1603.06111](https://arxiv.org/abs/1603.06111)