Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt.

Ramon, Aubin; Onuoha, Shimobi; Ni, Mingyang; Gaffey, Rebecca; Sormanni, Pietro; Kunz, Patrick; Predeina, Olga
doi:10.1080/19420862.2024.2442750
000296154 001__ 296154
000296154 005__ 20250112014745.0
000296154 0247_ $$2doi$$a10.1080/19420862.2024.2442750
000296154 0247_ $$2pmid$$apmid:39772905
000296154 0247_ $$2ISSN$$a1942-0862
000296154 0247_ $$2ISSN$$a1942-0870
000296154 0247_ $$2altmetric$$aaltmetric:172892558
000296154 037__ $$aDKFZ-2025-00082
000296154 041__ $$aEnglish
000296154 082__ $$a610
000296154 1001_ $$00009-0002-5502-5961$$aRamon, Aubin$$b0
000296154 245__ $$aPrediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt.
000296154 260__ $$aLondon$$bTaylor & Francis$$c2025
000296154 3367_ $$2DRIVER$$aarticle
000296154 3367_ $$2DataCite$$aOutput Types/Journal article
000296154 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1736433578_15823
000296154 3367_ $$2BibTeX$$aARTICLE
000296154 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000296154 3367_ $$00$$2EndNote$$aJournal Article
000296154 520__ $$aIn-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt's potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.
000296154 536__ $$0G:(DE-HGF)POF4-312$$a312 - Funktionelle und strukturelle Genomforschung (POF4-312)$$cPOF4-312$$fPOF IV$$x0
000296154 588__ $$aDataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
000296154 650_7 $$2Other$$aBiological sciences – biophysics and computational biology
000296154 650_7 $$2Other$$aProtein fitness
000296154 650_7 $$2Other$$aantibody design
000296154 650_7 $$2Other$$aantibody engineering
000296154 650_7 $$2Other$$aensemble model
000296154 650_7 $$2Other$$amachine learning
000296154 650_7 $$2Other$$ananobody
000296154 650_7 $$2Other$$asemi-supervised learning
000296154 650_7 $$2Other$$athermostability
000296154 650_7 $$2NLM Chemicals$$aSingle-Domain Antibodies
000296154 650_2 $$2MeSH$$aSingle-Domain Antibodies: chemistry
000296154 650_2 $$2MeSH$$aSingle-Domain Antibodies: immunology
000296154 650_2 $$2MeSH$$aProtein Stability
000296154 650_2 $$2MeSH$$aHumans
000296154 650_2 $$2MeSH$$aSoftware
000296154 650_2 $$2MeSH$$aComputer Simulation
000296154 7001_ $$aNi, Mingyang$$b1
000296154 7001_ $$aPredeina, Olga$$b2
000296154 7001_ $$aGaffey, Rebecca$$b3
000296154 7001_ $$0P:(DE-He78)c4e25fa3671791de6626f8aab98a31e5$$aKunz, Patrick$$b4
000296154 7001_ $$aOnuoha, Shimobi$$b5
000296154 7001_ $$00000-0002-6228-2221$$aSormanni, Pietro$$b6
000296154 773__ $$0PERI:(DE-600)2537838-7$$a10.1080/19420862.2024.2442750$$gVol. 17, no. 1, p. 2442750$$n1$$p2442750$$tmAbs$$v17$$x1942-0862$$y2025
000296154 909CO $$ooai:inrepo02.dkfz.de:296154$$pVDB
000296154 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)c4e25fa3671791de6626f8aab98a31e5$$aDeutsches Krebsforschungszentrum$$b4$$kDKFZ
000296154 9131_ $$0G:(DE-HGF)POF4-312$$1G:(DE-HGF)POF4-310$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vFunktionelle und strukturelle Genomforschung$$x0
000296154 9141_ $$y2025
000296154 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0320$$2StatID$$aDBCoverage$$bPubMed Central$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2023-07-18T15:26:08Z
000296154 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2023-07-18T15:26:08Z
000296154 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Anonymous peer review$$d2023-07-18T15:26:08Z
000296154 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bMABS-AUSTIN : 2022$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bMABS-AUSTIN : 2022$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2023-10-26
000296154 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2023-10-26
000296154 9201_ $$0I:(DE-He78)B070-20160331$$kB070$$lB070 Funktionelle Genomanalyse$$x0
000296154 980__ $$ajournal
000296154 980__ $$aVDB
000296154 980__ $$aI:(DE-He78)B070-20160331
000296154 980__ $$aUNRESTRICTED
guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help