Journal Article DKFZ-2025-00082

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt.

 ;  ;  ;  ;  ;  ;

2025
Taylor & Francis London

mAbs 17(1), 2442750 () [10.1080/19420862.2024.2442750]
 GO

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: In-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt's potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.

Keyword(s): Single-Domain Antibodies: chemistry (MeSH) ; Single-Domain Antibodies: immunology (MeSH) ; Protein Stability (MeSH) ; Humans (MeSH) ; Software (MeSH) ; Computer Simulation (MeSH) ; Biological sciences – biophysics and computational biology ; Protein fitness ; antibody design ; antibody engineering ; ensemble model ; machine learning ; nanobody ; semi-supervised learning ; thermostability ; Single-Domain Antibodies

Classification:

Contributing Institute(s):
  1. B070 Funktionelle Genomanalyse (B070)
Research Program(s):
  1. 312 - Funktionelle und strukturelle Genomforschung (POF4-312) (POF4-312)

Appears in the scientific report 2025
Database coverage:
Medline ; DOAJ ; Article Processing Charges ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Essential Science Indicators ; Fees ; IF >= 5 ; JCR ; PubMed Central ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2025-01-09, last modified 2025-01-12



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)