Skip to main content

Table 1 sORF prediction tools

From: Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures

Prediction tool

References

Website

Description

Coding Non-Coding Identifying Tool (CNIT)

[126]

http://cnit.noncode.org/CNIT/

Distinguishes between coding and non-coding regions based on intrinsic sequence compositions

Coding Region Identification Tool Invoking Comparative Analysis (CRITICA)

[127]

http://rdpwww.life.uiuc.edu/

Analyses nucleotide sequence composition and conservation at the amino acid level

Coding Potential Calculator (CPC)/CPC2

[128, 129]

http://cpc.cbi.pku.edu.cn

http://cpc2.gao-lab.org/

Assess protein-coding potential based on important features (ORF size, coverage, integrity); CPC2 improves run speed and accuracy

Coding Potential Predictor (CPPred)

[130]

http://www.rnabinding.com/CPPred/

Predicts the coding potential of RNA transcript

CPPred-sORF

[131]

http://www.rnabinding.com/CPPred-sORF/

Addition of 2 new features from CCPred i.e., GCcount, mRNN-11codons and CUG, GUG start codons

MicroPeptide Tool (MiPepid)

[21]

https://github.com/MindAI/MiPepid

Identifies coding sORFs based on existing microproteins subpopulation set

sORF Finder

[52]

http://evolver.psc.riken.jp/

Identifies sORF with high coding potential based on nucleotide composition bias and potential functional constraint at the amino acid level

smORFunction

[132]

https://www.cuilab.cn/smorfunction/home

Provides function prediction of sORFs/microproteins

miPFinder

[133]

https://github.com/DaStraub/miPFinder

Identifies and evaluates microproteins functionality using information on size, domain, protein interactions and evolutionary origin

PhastCons

[134]

http://compgen.cshl.edu/phast/

Based on conservation scoring and identification of conserved elements

PhyloCSF

[135]

http://compbio.mit.edu/PhyloCSF

Determines a conserved protein-coding region based on formal statistical comparison of phylogenetic codon models

uPEPperoni

[136]

http://upep-scmb.biosci.uq.edu.au/

Specifically for 5’UTR sORFs, based on conservation

AnABLAST

[34]

http://www.bioinfocabd.upo.es/ab/

Identifies putative protein-coding regions in DNA regardless of ORF length and reading frame shifts

Small Peptide Alignment Discovery Application (SPADA)

[137]

https://github.com/orionzhou/SPADA

Homology-based gene prediction programme

Deep Neural Network for coding potential prediction (DeepCPP)

[138]

https://github.com/yuuuuzhang/DeepCPP

Effective on RNA coding potential prediction, spefically sORF mRNA prediction

  1. This table shows prediction tools that can be used for putative sORF detection based on sequence homology and similarity in all genomes. CNIT and CPPred utilises a positive set of normal-sized proteins and may not be optimised for sORF and microprotein detection. CPPre-sORF is an improved version of CPPred for sORF detection. MiPepid, sORF Finder, miPFinder and smORFunction are designed especially for sORF detection, identification, and function prediction. PhastCons, PhyloCSF, SPADA and uPEPperoni utilise conservation analyses for prediction, with the latter designed spefically for sORFs in the upstream region. DeepCPP is based on a deep learning method to evaluate RNA coding potential and demonstrated high performance in sORF data