paint-brush
Extra Information on the Comprehensive Comparison of Tools for Fitting Mutational Signaturesby@mutation

Extra Information on the Comprehensive Comparison of Tools for Fitting Mutational Signatures

tldt arrow

Too Long; Didn't Read

Extra Information on the Comprehensive Comparison of Tools for Fitting Mutational
featured image - Extra Information on the Comprehensive Comparison of Tools for Fitting Mutational Signatures
Mutation Technology Publications HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Matu´s Medo, Department for BioMedical Research, Inselspital, Bern University Hospital, University of Bern, Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern and [email protected];

(2) Michaela Medova, Department for BioMedical Research, Inselspital, Bern University Hospital, University of Bern, and Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern.

Abstract & Introduction

Results

Discussion

Materials and methods

Acknowledgement & References

Supporting Information

Supporting Information

Supplementary Figure 1: Exponentiated Shannon entropy of signatures (top row) and exponentiated Shannon entropy multiplied with the signature’s mean absolute Pearson correlation with all other signatures (bottom row) versus the average fitting error achieved by the evaluated tools (as in SF2, we exclude YAPSA, sigminer QP, and sigminer SA that produce the same results as MutationalPatterns). We see that the exponentiated Shannon entropy (which measures an effective number of active contexts for a signature) is closely related with the fitting error and the agreement improves with the number of mutations per sample. The agreement further improves when the average correlation with other signatures is taken into account. In summary, flat signatures that are on average similar to other signatures are the most difficult to fit.


Supplementary Figure 2: A comparison of signature fitting tools for single-signature cohorts. Mean fitting error (top row) and mean total weight assigned to false positive signatures (bottom row) for different numbers of mutations per sample (columns) for all evaluated fitting tools. The results are averaged over 50 cohorts from eight cancer types (see Methods), all 67 COSMICv3 signatures were used for fitting. The best-performing tool in each panel is marked with a frame. Results are not shown for YAPSA, sigminer QP, and sigminer SA as they are close (fitting error correlation above 0.999) to the results of MutationalPatterns.


Supplementary Figure 3: As Figure 2 in the main text but with a logarithmic y-axis. A straight line with the slope of −β in the log-log scale implies the power-law dependence E ∼ m −β between the number of mutations per sample, m, and the fitting error, E. When the fitting error is averaged over all non-artifact signatures (last panel), linear fits to the empirical dependencies in the range N ≥ 1600 yield exponents between 0.35 (for mmsig) and 0.59 (sigfit).


Supplementary Figure 4: Tools with the lowest average fitting error in single-signature cohorts for each signature and the number of mutations per sample (for each signature, we created one cohort with 100 samples).



Supplementary Figure 5: The average running times (per sample) in single-signature cohorts (left) and heterogeneous cohorts (right) for all evaluated tools. In the right panel, sigminer QP and sigminer NNLS overlap. SigsPack and mmsig are the fastest and slowest tool in both cases; the ratios between their running times are 6,500 and 22,500 for single-signature and heterogeneous cohorts, respectively. deconstructSigs is more than 10 times slower in heterogeneous cohorts. Simulations were run on Intel CPUs i5-6500 @ 3.20GHz.



Supplementary Figure 6: The running times for single-signature cohorts plotted against the mean fitting error. The running times of three tools (deconstructSigs, mmsig, and SigProfilerSingleSample) grow with the signature difficulty



Supplementary Figure 7: The running time as a function of the number of reference signatures (the running times have been normalized by the running time for the smallest number of reference signatures) in heterogeneous cohorts. With respect to the number of reference signatures, the running time of mmsig grows almost with the third power whereas the running times of SigsPack, MutationalPatterns, sigminer QP, sigminer NNLS, and sigLASSO are (nearly) independent of the number of reference signatures.



Supplementary Figure 8: The activity of signatures from the COSMICv3 reference catalog in the WGS-sequenced tissue data from various cancers at the COSMIC website (https://cancer.sanger.ac.uk/signatures/sbs/). The eight cancer types chosen for the evaluation of signature fitting tools are marked with bold.



Supplementary Figure 9: Relative signature contributions in heterogeneous cohorts. Empirical signature weights in WGS tissues from eight different cancer types (data obtained from the COSMIC website). Each panel shows six signatures with the highest median weight.



Supplementary Figure 10: Mean fitting error for the evaluated signature fitting tools stratified by the cancer type (thiscomplements Figure 4 in the main text). Plus symbols mark the best tool for each cancer type and a given number of


Supplementary Figure 11: Sample reconstruction similarity reported by SigProfilerSingleSample versus the samplefitting error in heterogeneous cohorts stratified by the number of mutations per sample. Note the different x-axis range


Supplementary Figure 12: Sample reconstruction quality metrics reported by SigProfilerAssignment versus the samplefitting error in heterogeneous cohorts stratified by the number of mutations per sample.


Supplementary Figure 13: Fitting error difference between the results obtained with COSMICv3 and COSMICv3.2,respectively, as a reference. Tool sigfit is omitted here because it failed to converge with COSMICv3.2. The differences are mostly negative as a result of the increased number of signatures included in COSMICv3.2 (78 as opposed


Supplementary Figure 14: Fitting error difference between the results obtained using COSMICv3 and only the relevantsignatures (signatures that are active for a given cancer type and all artifact signatures), respectively, as a reference


Supplementary Figure 15: A comparison of signature fitting tools for heterogeneous cohorts when only the relevantsignatures are used as a reference by the fitting tools. The results are averaged over 50 cohorts with 100 samples for


Supplementary Figure 16: A comparison of signature fitting tools for cohorts where 90% of mutations come from onesignature (four columns corresponding to SBS1–4 being used as the main signature) and 10% of mutations come from


Supplementary Figure 17: Fitting error difference between the results obtained using COSMICv3 and a self-determinedlist of active signatures (see Methods in the main text), respectively, as a reference. A positive difference means that


Supplementary Figure 18: As SF17 but using a less strict process to prune the reference signatures: every signaturethat has the relative weight above x for at least one sample is included (x is 0.05 for 100 mutations per sample, 0.03


Supplementary Figure 19: Fitting error difference between the results obtained using a self-determined list of activesignatures and the method by Maura et al. (see Methods in the main text), respectively, as a reference. A positive


Supplementary Figure 20: By increasing the number of mutations per sample in small steps, we identify the number ofmutations at which SigProfilerAssignment starts to outperform SigProfilerSingleSample for each cancer type (using


Supplementary Figure 21: As Figure 7 in the main text but systematic differences are introduced for signature SBS1which is easier to fit than SBS40 used in Figure 7. We see that for “easy” signatures, the differences between fitting


Supplementary Figure 22: As SF 9 but using a different evaluation metric, one minus the Pearson correlation coefficient between the true and estimated signature weights. The ranking of tools is similar as in SF9.


Supplementary Figure 23: As SF 17 but using a different evaluation metric, one minus the Pearson correlation coefficient between the true and estimated signature weights. The impact of using a self-determined set of referencesignatures is similar as in SF16.


This paper is available on Arxiv under CC 4.0 license.