Tools | Year | Benchmark dataset size (promoter) | Window size | Sequence similarity | Feature extraction | Feature selection | Classfication algorithm | Evaluation strategy | AUC | DOI | Web server/stand-alone package | ||
1.TLS-NNPP | 2005 | 771 (E.coli) | 46 | / | The empirical probability distribution of TSS-TLS distance | / | ANN | Independent test | / | https://doi.org/10.1093/bioinformatics/bti047 | http://www.uow.edu. au/∼yanxia/E_Coli_paper/SBurden_Results.xls | ||
2.SIDD | 2006 | 500 (E.coli) | 100 | / | SIDD | / | FLD | Independent test | / | https://doi.org/10.1186/1471-2105-7-248 | / | ||
3.FS_LSSVM | 2007 | 53 (E.coli) | 57 | / | A domain theory for promoters | C4.5 decision tree | LSSVM | 10-fold cross-validation | / | https://doi.org/10.1016/j.amc.2007.02.033 | / | ||
4.Free energy | 2007 | 1044
(E.coli) 879 (B.subtilis) |
101 1001 |
/ | Free energy | / | Modified scoring function | Independent test | / | https://doi.org/10.1007/s12038-007-0085-1 | / | ||
5.PromPredict | 2009 | 1145
(E.coli) 615 (B.subtilis) 82 (M.tuberculosis) |
1001 | / | GC content; Average free energy | / | difference between the average free energy | Training and validation | / | https://doi.org/10.1039/B906535K | http://nucleix.mbu.iisc.ernet.in/prompredict/prompredict.html | ||
6.SIDD-ANN | 2010 | 1648 (E.coli) | 250 | / | SIDD profile data | / | ANN | Independent test | / | https://doi.org/10.1186/1471-2105-11-S6-S17 | / | ||
7.PePPER | 2012 | L.lactis | / | / | PWM | / | HMM | / | / | https://doi.org/10.1186/1471-2164-13-299 | http://pepper.molgenrug.nl/ | ||
8.G4PromFinder | 2018 | 3570
(S.coelicolor) 2117 (P.aeruginosa) |
251 | / | AT-rich element and G-quadruplex motif-based algorithm | / | / | Independent test | / | https://doi.org/10.1186/s12859-018-2049-x | https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-018-2049-x/MediaObjects/12859_2018_2049_MOESM4_ESM.py | ||
9.LN-QSAR | 2018 | 135 (M.bovis) | / | / | Pseudo-folding 2D lattice graph | / | LDA | Independent test | / | https://doi.org/10.1016/j.jtbi.2008.09.035 | / | ||
10.Ensemble-SVM | 2006 | 450 (E.coli 考70) | 200 | / | k-mer with location with respect to the TSS | Symmetric uncertainty | Ensemble-SVM | 10-fold cross-validation | / | https://doi.org/10.1093/bioinformatics/bti771 | http://eresearch.fit.qut.edu.au/ | ||
11.TSS-PREDICT | 2008 | 450
(E.coli 考70) 205 (B.subtilis) 26 (C.trachomatis) |
250 | / | Information Content; PWM | / | Ensemble-SVM | Independent test | / | https://doi.org/10.1016/j.compbiolchem.2008.07.009 | / | ||
12.TSS-SLP | 2007 | 669 (E.coli 考70) | 80 | / | Dinucleotide Frequency Features | / | SLP | 5-fold cross-validation; Independent test | / | https://doi.org/10.1093/bioinformatics/btl670 | http://202.41.85.117/htmfiles/faculty/tsr/tsr.html | ||
13.PCSF | 2006 | 683 (E.coli 考70) | 81 | / | Conversation of sequence segments; PCSF | / | Score function | 10-fold cross-validation | / | https://doi.org/10.1016/j.jtbi.2006.02.007 | / | ||
14.IPMD | 2010 | 270
(B.subtilis 考43) 741 (E.coli 考70) |
81 | / | PCSF; ID | / | Modified MD | 10-fold cross-validation | 0.847
(B.subtilis) 0.920 (E.coli) |
https://doi.org/10.1007/s12064-010-0114-8 | / | ||
15.70ProPred | 2018 | 741 (E.coli 考70) | 81 | / | PSTNPss; PseEIIP | / | SVM | 5-fold cross-validation; Jackknife test | 0.990 | https://doi.org/10.1186/s12918-018-0570-1 | http://server.malab.cn/70ProPred/ | ||
16.iProEP | 2019 | 270
(B.subtilis) 741 (E.coli) |
81 | ≒80% | PseKNC; PCSF | mRMR; IFS | SVM | 10-fold cross-validation | 0.988
(B.subtilis) 0.976 (E.coli) |
https://doi.org/10.1016/j.omtn.2019.05.028 | http://lin-group.cn/server/iProEP/ | ||
17.IPWM | 2011 | 683 (E.coli 考70) | 81 | / | Entropy-based conservative characteristics; Improved PWM | / | Score function | 10-fold cross-validation | / | https://doi.org/10.1504/Ijdmb.2011.038575 | / | ||
18.BacPP | 2011 | 1034 (E.coli) | / | / | Binary digits | / | ANN | (2,3,10)-fold cross-validation; Independent test | / | https://doi.org/10.1016/j.jtbi.2011.07.017 | http://www.bacpp.bioinfoucs.com/home | ||
19.vw Z-curve | 2012 | 1401
(E.coli) 660 (B.subtilis) |
80 | / | variable-window Z-curve | IFS | PLS | 10-fold cross-validation | / | https://doi.org/10.1093/nar/gkr795 | MATLAB code | ||
20.Stability | 2014 | 1035 (E.coli) | 81 | / | DNA duplex stability | / | ANN | (2,3,10)-fold cross-validation | / | https://doi.org/10.1016/j.biologicals.2013.10.001 | / | ||
21.iPro54-PseKNC | 2014 | 161 (prokaryotic 考54) | 81 | ≒75% | PseKNC | F-score; IFS | SVM | Jackknife test | / | https://doi.org/10.1093/nar/gku1019 | http://lin-group.cn/server/iPro54-PseKNC | ||
22.PromotePredictor | 2018 | 161 (prokaryotic 考54) | 81 | ≒75% | Motif profile-based ANF | MRMD | Bagging; RF; SVM | 10-fold cross-validation; Independent test | / | https://doi.org/10.1109/tcbb.2018.2816032 | https://github.com/maqin2001/PromotePredictor | ||
23.meta-predictior | 2015 | 579 (E.coli 考70) | 81 | ≒45% | sequence-based features; structure-based features | / | Meta-predictor | Independent test | 0.850 | https://doi.org/10.1371/journal.pone.0119721 | / | ||
24.bTSSfinder | 2017 | 3597
(E.coli) 12797 (Nostoc) 351 (Synechocystis) 1471 (S.elongatus) |
251 1101 |
/ | PWM; Physicochemical properties | Mahalanobis distance | ANN | Independent test | / | https://doi.org/10.1093/bioinformatics/btw629 | https://www.cbrc.kaust.edu.sa/btssfinder/ | ||
25.iPro70-PseZNC | 2017 | 741 (E.coli 考70) | 81 | / | PseZNC | F-score; IFS | SVM | 5-fold cross-validation | 0.909 | https://doi.org/10.1109/tcbb.2017.2666141 | http://lin-group.cn/server/iPro70-PseZNC | ||
26.iPromoter-FSEn | 2019 | 741 (E.coli 考70) | 81 | / | Nucleotide Statistics; k-mer; g-gapped k-mer; Approximate signal pattern count; Position specific occurences; Distribution of nucleotides | Feature subspace | Ensemble learning | 10-fold cross-validation | 0.932 | https://doi.org/10.1016/j.ygeno.2018.07.011 | http://ipromoterfsen.pythonanywhere.com/server | ||
27.iPro70-FMWin | 2018 | 741 (E.coli 考70) | 81 | / | k-mer; g-gapped k-mer; Pattern finding; Positioning distance count | Adaboost | LR | 10-fold cross-validation | 0.959 | https://doi.org/10.1007/s00438-018-1487-5 | http://ipro70.pythonanywhere.com/server | ||
28.CNNProm | 2017 | 839
(E.coli 考70) 746 (B.subtilis) |
81 | / | one-hot | / | CNN | 5-fold cross-validation | / | https://doi.org/10.1371/journal.pone.0171410 | http://www.softberry.com/berry.phtml?topic=cnnpromoter_b&group=programs&subgroup=deeplearn | ||
29.IBBP | 2018 | 1888 (E.coli 考70) | 81 | / | Image-based and evolutionary approach | / | SVM | Independent test | / | https://doi.org/10.1038/s41598-018-36308-0 | https://github.com/hahatcdg/IBPP | ||
30.SAPPHIRE | 2020 | 170 (P. aeruginosa and P. putida 考70) | / | / | one-hot | / | ANN | 5-fold cross-validation; Independent test | / | https://doi.org/10.1186/s12859-020-03730-z | https://sapphire.biw.kuleuven.be/ | ||
31.iPromoter-2L | 2018 | 2860 (E.coli) | 81 | ≒80% | Multi-window-based PseKNC | / | RF | 5-fold cross-validation; Jackknife test | / | https://doi.org/10.1093/bioinformatics/btx579 | http://bioinformatics.hitsz.edu.cn/iPromoter-2L/ | ||
32.iPromoter-2L2.0 | 2019 | 2860 (E.coli) | 81 | ≒80% | Smoothing Cutting Window algorithm; k-mer; PseKNC | / | SVM; Ensemble learning | 5-fold cross-validation | / | https://doi.org/10.1016/j.omtn.2019.08.008 | http://bliulab.net/iPromoter-2L2.0/ | ||
33.MULTiPly | 2019 | 2860 (E.coli) | 81 | ≒80% | Bi-profile bayes; KNN; k-mer; DAC | F-score | SVM | 5-fold cross-validation; Jackknife test; Independent test | / | https://doi.org/10.1093/bioinformatics/btz016 | http://flagshipnt.erc.monash.edu/MULTiPly/ | ||
34.pcPromoter-CNN | 2020 | 2860 (E.coli) | 81 | ≒80% | one-hot | / | CNN | 5-fold cross-validation; Independent test | 0.957 | https://doi.org/10.3390/genes11121529 | http://nsclbio.jbnu.ac.kr/tools/pcPromoter-CNN/ | ||
35.iPromoter-BnCNN | 2020 | 2860 (E.coli) | 81 | ≒80% | one-hot;
k-mer; Structural
properties |
/ | CNN | 5-fold cross-validation; Independent test | / | https://doi.org/10.1093/bioinformatics/btaa609 | https://colab.research.google.com/drive/1yWWh7BXhsm8U4PODgPqlQRy23QGjF2DZ | ||
36.SELECTOR | 2021 | 2860 (E.coli) | 81 | ≒80% | CKSNAP; PCPseDNC; PSTNPss; DNA strand | / | Ensemble learing | 5-fold
cross-validation Independent test |
0.984 | https://doi.org/10.1093/bib/bbaa049 | http://selector.erc.monash.edu/ | ||
37.iPSW(2L)-PseKNC | 2019 | 3382 (E.coli) | 81 | ≒85% | NCP; ANF | / | SVM | 5-fold cross-validation | 0.905 | https://doi.org/10.1016/j.ygeno.2018.12.001 | http://www.jci-bioinfo.cn/iPSW(2L)-PseKNC | ||
38.deepPromoter | 2019 | 3382 (E.coli) | 81 | ≒85% | Combination of Continuous FastText N-Grams | MRMD | CNN | 5-fold cross-validation | 0.885 | https://doi.org/10.3389/fbioe.2019.00305 | https://github.com/khanhlee/deepPromoter | ||
39.iPSW(PseDNC-DL) | 2020 | 3382 (E.coli) | 81 | ≒85% | one-hot; PseDNC | / | CNN | 5-fold cross-validation | 0.925 | https://doi.org/10.1016/j.ygeno.2019.08.009 | https://home.jbnu.ac.kr/NSCL/PseDNC-DL.htm | ||
PWM: position weight matrix; SIDD: stress-induced DNA duplex destabilization; PCSF: position-correlation scoring function; ID: increment of diversity; PSTNPss: position-specific trinucleotide propensity based on single-strand; PseEIIP: electron-ion interaction pseudo-potentials of trinucleotide; PseKNC: pseudo k-tuple nucleotide composition; ANF: accmulative nucleotide frequency; PseZNC: pseudo multi-window Z-curve nucleotide composition; KNN: k-nearest neighbors; DAC: dinucleotide-based auto-covariance; PCPseDNC: parallel correlation pseudo dinucleotide composition; NCP: nucleotide chemical property; PseDNC: pseudo dinucleotide composition; mRMR: minimum redundancy maximum relevance; IFS: incremental feature selection; MRMD: maximum-relevance-maximum-distance; ANN: artificial neural network; SVM: support vector machine; FLD: fisher linear discriminant; SLP: single-layer perceptron; LSSVM: least square support vector machine; MD: mahalanobis discriminant; PLS: partial least squares; HMM: hidden markov models; RF: random forest; LR: logistic regression; CNN: convolution neural network; LDA: linear discriminant analysis. |