Writing
June 18, 2025 · 8 min read

RareFold: Expanding Protein Design Beyond Nature's 20 Amino Acids

The era of protein design limited to nature's 20 canonical amino acids is ending. RareFold, a groundbreaking deep learning model, can now predict and design proteins incorporating 29 noncanonical amino acids, dramatically expanding the chemical space for protein therapeutics. This breakthrough enables the creation of peptide binders with enhanced stability, reduced immunogenicity, and novel functions. By treating each amino acid as a distinct token, RareFold learns unique atomic interaction patterns, paving the way for next-generation therapeutics that could revolutionize medicine through improved drug design and protein engineering capabilities.

protein-designdeep-learningdrug-discoverycomputational-biologysynthetic-biology

RareFold: Expanding Protein Design Beyond Nature's 20 Amino Acids

The boundaries of protein engineering are being rewritten. For decades, computational protein design has been constrained by nature's canonical 20 amino acids – the building blocks that form the backbone of all natural proteins. While these amino acids have enabled life's incredible diversity, they represent just a fraction of the chemical space available for protein design. Today, we're witnessing a paradigm shift with RareFold, a deep learning breakthrough that expands protein structure prediction and design to include 29 noncanonical amino acids (NCAAs).

The Limitation of Nature's Alphabet

Nature's 20 amino acids have served life remarkably well, but they impose inherent limitations on protein design. Traditional computational approaches have been confined to this natural alphabet, restricting our ability to engineer proteins with truly novel properties. This constraint has been particularly limiting in therapeutic applications, where proteins often face challenges like:

  • Proteolytic degradation: Natural amino acids are readily recognized and cleaved by human proteases
  • Immunogenicity: The immune system has evolved to recognize natural protein sequences
  • Limited chemical diversity: The 20 canonical amino acids provide only a subset of possible chemical functionalities

Enter Noncanonical Amino Acids: The Expanded Toolkit

Noncanonical amino acids represent a vast chemical space beyond nature's standard set. These synthetic and modified amino acids offer unique properties that can dramatically enhance protein function:

Practical Advantages of NCAAs

Enhanced Stability: Many NCAAs are resistant to proteolytic cleavage, extending protein half-life in biological systems. This stability is crucial for therapeutic applications where drug persistence directly impacts efficacy.

Reduced Immunogenicity: Since NCAAs are rarely encountered by the human immune system, proteins incorporating them may evade immune recognition, reducing adverse reactions and improving therapeutic windows.

Novel Chemical Properties: NCAAs can introduce new functionalities – different side chain chemistries, modified backbone structures, or entirely new binding modes that aren't possible with canonical amino acids.

RareFold: The Technical Breakthrough

Tokenized Representation Strategy

RareFold's innovation lies in its tokenized approach to amino acid representation. Instead of treating amino acids as variations of a common theme, the model treats each of the 49 amino acids (20 canonical + 29 noncanonical) as distinct tokens. This approach enables:

  • Residue-specific learning: The model learns unique atomic interaction patterns for each amino acid type
  • Chemical diversity modeling: Complex interactions between diverse chemical functionalities are captured accurately
  • Scalable architecture: The tokenized approach can potentially accommodate even more NCAAs as they become available

Deep Learning Architecture

The model architecture builds upon recent advances in protein structure prediction while incorporating specific adaptations for NCAA handling:

  • Attention mechanisms: Capture long-range interactions between diverse amino acid types
  • Structural constraints: Incorporate physical and chemical constraints specific to NCAAs
  • Multi-scale representation: Model interactions at atomic, residue, and domain levels

EvoBindRare: From Prediction to Design

RareFold's capabilities extend beyond structure prediction to enable inverse design through EvoBindRare, a framework for generating peptide binders that incorporate NCAAs.

Design Framework Features

Sequence-Structure Co-optimization: Unlike traditional approaches that optimize sequence for a fixed structure or vice versa, EvoBindRare simultaneously optimizes both sequence and structure for optimal binding.

Linear and Cyclic Peptides: The framework supports both linear peptide design and cyclic peptide generation, with cyclization often providing additional stability and specificity.

Target-Specific Optimization: The system can be tailored to specific target proteins, optimizing for binding affinity, selectivity, and other desired properties.

Experimental Validation: Proof of Concept

The researchers demonstrated RareFold's practical utility by designing binders targeting a ribonuclease. The experimental results provide compelling validation:

  • Successful binding: Both linear and cyclic peptide designs achieved micromolar (μM) affinity
  • NCAA incorporation: The designed peptides successfully incorporated noncanonical amino acids
  • Practical validation: The results demonstrate that computationally designed NCAA-containing peptides can achieve meaningful biological activity

Implications for Drug Discovery and Therapeutics

Next-Generation Peptide Therapeutics

RareFold opens new avenues for peptide-based drug development:

Improved Pharmacokinetics: NCAA-containing peptides may exhibit superior ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties compared to natural peptides.

Enhanced Specificity: The expanded chemical space allows for more precise molecular recognition, potentially reducing off-target effects.

Immune Evasion: Reduced immunogenicity could enable chronic treatments and reduce the need for immunosuppressive co-therapies.

Broader Impact on Protein Engineering

Beyond therapeutics, RareFold's capabilities could revolutionize:

  • Industrial enzymes: Designing more stable and efficient biocatalysts
  • Biosensors: Creating more sensitive and selective detection systems
  • Biomaterials: Engineering proteins with novel mechanical and chemical properties

Technical Challenges and Future Directions

Current Limitations

While groundbreaking, RareFold faces several challenges:

NCAA Availability: Not all computationally designed sequences may be synthetically accessible Validation Complexity: Experimental validation of NCAA-containing proteins requires specialized synthesis and characterization techniques Scale Limitations: Current demonstrations focus on relatively small peptides

Future Development Opportunities

Expanded NCAA Library: Integration of additional noncanonical amino acids as they become available Larger Protein Design: Scaling to full-length proteins and protein complexes Multi-objective Optimization: Simultaneously optimizing for multiple properties (stability, activity, manufacturability)

Computational Considerations

Model Training and Data Requirements

Training RareFold requires substantial computational resources and carefully curated datasets:

  • Structural data: High-quality structures containing NCAAs are relatively scarce
  • Computational demands: The expanded chemical space increases model complexity
  • Validation datasets: Limited experimental data for NCAA-containing proteins

Accessibility and Implementation

The researchers have made RareFold available through GitHub, democratizing access to this powerful tool. This open-source approach will likely accelerate adoption and further development by the research community.

The Road Ahead: Transforming Protein Design

RareFold represents more than just a technical advancement – it's a paradigm shift toward truly synthetic biology. By breaking free from nature's constraints, we can now engineer proteins with properties that evolution never needed to explore.

Integration with Existing Workflows

The success of RareFold will depend on its integration with existing protein design and drug discovery pipelines. Key considerations include:

  • Synthesis compatibility: Ensuring designed sequences can be efficiently synthesized
  • Analytical methods: Developing characterization techniques for NCAA-containing proteins
  • Regulatory pathways: Establishing approval processes for synthetic protein therapeutics

Long-term Vision

As the field matures, we can envision a future where:

  • Custom amino acids: Bespoke amino acids designed for specific applications
  • Programmable proteins: Proteins with precisely tuned properties for specific environments
  • Sustainable manufacturing: Reduced reliance on natural protein sources

Conclusion: A New Chapter in Protein Engineering

RareFold marks the beginning of a new era in protein design, where the constraints of natural evolution no longer limit our engineering capabilities. By expanding the amino acid alphabet from 20 to 49 and potentially beyond, we're opening vast new territories for exploration in drug discovery, biotechnology, and synthetic biology.

The experimental validation of μM-affinity binders demonstrates that this isn't just theoretical – it's practical, achievable, and ready for application. As the technology matures and becomes more accessible, we can expect to see a new generation of protein therapeutics that are more stable, more specific, and more effective than anything nature has provided.

The future of protein engineering is no longer constrained by evolution's choices. With tools like RareFold, we're writing the next chapter of this story ourselves.

References
  1. 012025.05.19.654846v1.fullProtein structure prediction and design have traditionally been limited to the 20 canonical amino acids. Expanding this space to include noncanonical amino acids (NCAAs) offers new opportunities for probing novel interactions and engineering proteins with enhanced or entirely new functions. Some NCAAs also offer practical advantages, such as increased proteolytic stability and reduced immunogenicity, as they are rarely encountered by the human immune system. Here, we present RareFold, a deep lea...