The CRISPR–Cas adaptive immune systems present in bacteria and archaea now are classified into five major types mainly based on the repertoires of cas genes and the organization of cas operons (1,2). While the Cas protein components associated with the CRISPR-Cas systems are extraordinarily diverse, renewed efforts in the identification of several signature elements in their own genomes have led to a relatively simple classification of these systems (2). These five types are readily distinguishable by virtue of the presence of unique signature proteins: Cas3 for type I, Cas9 for type II, Cas10 for type III, Csf1 for putative type IV and Cpf1 for putative type V (1). Owing to the high rate of evolution for most cas genes and the remarkable diversity in cas operon architecture, the consistent annotation of Cas proteins, the quick delineation of the architectonics of cas operons and the classification of CRISPR–Cas systems has encountered great obstacles (1,3). Therefore, expeditious and precise annotation or recognition of Cas proteins is urgently needed. In addition, Cas9 protein, an large multidomain, RNA-guided DNA endonuclease, together with single guide RNA (sgRNA) has become a powerful tool for engineering the genome in various organisms (4). Recent studies indicate that Cpf1 protein also mediates RNA-guided target cleavage, and along with CRISPR RNAs (crRNAs) has the potential for sequence-specific genome editing (4-6). So It is of great significance to identify new Cas proteins in particular Cas9 and Cpf1.
Similarity searches have been widely used for rapid computational annotation of protein sequences (7). At least two popular web services have provided very handy tools for functional annotation of proteins at the level of protein domains and/or protein families (7,8). One is called CD-search, which combines the sensitivity of carefully constructed position-specific scoring matrices (PSSMs) based on multiple sequence alignments with the speed and significance statistics of the RPS-BLAST algorithm (8). The other is HMMER web server. This web service provides access to the protein homology search algorithms phmmer, hmmscan, hmmsearch and jackhmmer in the HMMER software suite (7). The continued growth of the PSSMs in the CD-search and the profile hidden Markov models (HMMs) or target sequence databases in HMMER service means that virtually every protein would be annotated by either of the two web services. The identification of type-specific proteins such as Cas proteins from archaeal and bacterial proteomes is thus impractical for both CD-search and HMMER web server.
Custom HMMs (9,10), or the HMMs deposited in the TIGRFAMs and Pfam protein families databases (11-14), or PSSMs (1) for known protein families associated with CRISPR–Cas systems can be used to identify Cas proteins. Rabby et al. built the HMMs of cas gene sets for the identification of cas genes in the genomes of enteric microorganisms (10). Zhang et al. collected 130 known Cas protein family HMMs to recognize Cas proteins from human microbiome project datasets (11). Makarova et al. developed a library of 394 PSSMs for 93 known Cas protein families to identify cas genes in complete archaeal and bacterial genomes (1). However, a user-friendly dedicated web tool to automatically, robustly identify Cas proteins seems to be mandatory for their optimum, rapid exploration and in-depth analysis, in order to increase the efficiency of CRISPR-Cas investigations.
Here, we construct the web service, HMMCAS which uses hmmscan homology search algorithm in HMMER3.1 (15) to runs live searches against the collection of Cas protein family HMMs for protein sequences or a complete proteome of bacteria and archaea supplied by users. Search results can return a summary table which contains newly annotated Cas proteins, and if the complete proteome is submitted, users can find putative cas operon and the corresponding type can be reported.
1. Makarova, K.S., Wolf, Y.I., Alkhnbashi, O.S., Costa, F., Shah, S.A., Saunders, S.J., Barrangou, R., Brouns, S.J., Charpentier, E., Haft, D.H. et al. (2015) An updated evolutionary classification of CRISPR-Cas systems. Nature reviews. Microbiology, 13, 722-736.
2. Chylinski, K., Makarova, K.S., Charpentier, E. and Koonin, E.V. (2014) Classification and evolution of type II CRISPR-Cas systems. Nucleic acids research, 42, 6091-6105.
3. Makarova, K.S., Haft, D.H., Barrangou, R., Brouns, S.J., Charpentier, E., Horvath, P., Moineau, S., Mojica, F.J., Wolf, Y.I., Yakunin, A.F. et al. (2011) Evolution and classification of the CRISPR-Cas systems. Nature reviews. Microbiology, 9, 467-477.
4. Wang, H., La Russa, M. and Qi, L.S. (2016) CRISPR/Cas9 in Genome Editing and Beyond. Annual review of biochemistry, 85, 227-264.
5. Fonfara, I., Richter, H., Bratovic, M., Le Rhun, A. and Charpentier, E. (2016) The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA. Nature, 532, 517-521.
6. Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O., Slaymaker, I.M., Makarova, K.S., Essletzbichler, P., Volz, S.E., Joung, J., van der Oost, J., Regev, A. et al. (2015) Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell, 163, 759-771.
7. Finn, R.D., Clements, J., Arndt, W., Miller, B.L., Wheeler, T.J., Schreiber, F., Bateman, A. and Eddy, S.R. (2015) HMMER web server: 2015 update. Nucleic acids research, 43, W30-38.
8. Marchler-Bauer, A. and Bryant, S.H. (2004) CD-Search: protein domain annotations on the fly. Nucleic acids research, 32, W327-331.
9. Vestergaard, G., Garrett, R.A. and Shah, S.A. (2014) CRISPR adaptive immune systems of Archaea. RNA biology, 11, 156-167.
10. Rabby, M.A., Islam, T.M., Rahman, M.H. and Islam, M.R. (2013) Identification of Genes of CRISPR Associated Proteins CAS in the Genomes of Enteric Microorganisms. Biomirror, 4, 7-10.
11. Zhang, Q., Doak, T.G. and Ye, Y. (2014) Expanding the catalog of cas genes with metagenomes. Nucleic acids research, 42, 2448-2459.
12. Eloe-Fadrosh, E.A., Paez-Espino, D., Jarett, J., Dunfield, P.F., Hedlund, B.P., Dekas, A.E., Grasby, S.E., Brady, A.L., Dong, H., Briggs, B.R. et al. (2016) Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nature communications, 7, 10476.
13. Haft, D.H., Selengut, J.D. and White, O. (2003) The TIGRFAMs database of protein families. Nucleic acids research, 31, 371-373.
14. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J. et al. (2012) The Pfam protein families database. Nucleic acids research, 40, D290-301.
15. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755-763.