Transformer-Based Prediction of CRISPR Off-Target Effects Using Pan-Genomic Embeddings Across Species

Authors

  • K.Samunnisa Assistant Professor, Department of CSE, Ashoka Womens Engineering College, Kurnool, Andhra Pradesh, India. Author
  • Murtuza Ahamed Khan Lecturer, Department of Computer Engineering, College of Computer Science, King Khalid University, Abha, Saudi Arabia Author
  • Emmanuel L. Howe North West University Business School (NWU), Eswatini, Southern Africa, Mbabane Author

Keywords:

CRISPR-Cas9, off-target prediction, Transformer model, pan-genomic embeddings, cross-species genomics, guide RNA, sequence modeling

Abstract

The CRISPR-Cas9 genome editing system has transformed biomedical and agricultural research by enabling precise genetic modifications. However, its off-target effects—unintended edits at genomic loci with partial sequence similarity—remain a major safety concern, particularly in therapeutic and cross-species applications. Accurate prediction of off-target cleavage sites is essential to mitigate these risks. This study aims to develop a generalizable and interpretable model for predicting CRISPR-Cas9 off-target effects across multiple species using Transformer-based sequence modeling combined with pan-genomic embeddings. We introduce a novel framework that utilizes k-mer tokenization and unsupervised FastText-based embeddings trained on concatenated multi-species genomes to represent gRNA and candidate target sites. These embeddings are fed into a custom Transformer encoder that captures contextual nucleotide dependencies and cross-sequence interactions. The model is evaluated on curated datasets comprising ~150,000 labeled CRISPR off-target events from human, mouse, zebrafish, and Arabidopsis genomes. Evaluation metrics include AUPRC, F1-score, and AUROC under both stratified 5-fold cross-validation and leave-one-species-out protocols. The proposed model achieves an AUPRC of 0.768 in cross-validation and demonstrates up to 15% improvement in generalization on unseen species compared to DeepCRISPR and DeepSpCas9 baselines. Statistical tests confirm the significance of performance gains (p < 0.01). This research contributes a robust cross-species prediction tool, offering improved safety insights for genome editing applications in non-human systems, and establishes a scalable methodology for embedding-driven CRISPR modeling.

Downloads

Published

2024-03-31

How to Cite

K.Samunnisa, Murtuza Ahamed Khan, & Emmanuel L. Howe. (2024). Transformer-Based Prediction of CRISPR Off-Target Effects Using Pan-Genomic Embeddings Across Species. Synthesis: A Multidisciplinary Research Journal, 2(1), 35-43. https://www.macawpublications.com/Journals/index.php/SMRJ/article/view/190

Share