Similarity Analysis of Protein Sequences Based on the EMD Method |
School of Science, Dalian Jiaotong University, Dalian 116028, China Department of Computer Science and Technology, Dalian Neusoft University of Information, Dalian 116023, China |
Abstract An Empirical Mode Decomposition (EMD) method to analyze the similarities of protein sequences is
proposed. The EMD method was used to divide a signal sequence converted from a protein sequence
into a group of well-behaved Intrinsic Mode Functions (IMFs) and a residue which is monotonic or a
trend. This is so that the similarities can be compared among protein sequences by the corresponding
residues conveniently and intuitively. This work verifies the method's suitability by using the cytochrome
c protein sequences of seven different species.
Project supported by the Educational Commission of LiaoNing Province of China (Grant No.
L2012167) and the National Natural Science Foundation of China (Nos. 61273022, 11271060,
U0935004, U1135003, 11071031, 11290143).
Cite this article: |
Jihong Zhang,Junsheng Zheng,Fenglan Bai, et al. Similarity Analysis of Protein Sequences Based on the EMD Method[J]. Journal of Fiber Bioengineering and Informatics, 2014, 7(3): 387-395.
[1] Randic M. On graphical and numerical characterization of protemics maps. J Chem Inf Comput
Sci. 2001; 41: 1330-1338.
[2] Randic M, Zupan J, Novic M. On 3-D graphical repressentation of proteomics maps and their
numerical charaterization. J Chem Inf Comput Sci. 2001; 41: 1339-1344.
[3] Randic M, Zupan J, Novic M, Gute B, Basak S C. Novel matrix invariants for charaterization of
changes of proteomics maps. SAR QSAR Environ Res. 2002; 13: 689-703.
[4] Randic M, Vracko M, Lers N, Plavsic D. Novel 2-D Graphical representation of DNA sequence
and their numerical characterization. Chem Phys Lett. 2003; 368: 1-6.
[5] Randic M. 2-D Graphical representation of proteins based on virtual genetic code. SAR QSAR
Environ Res. 2004; 15: 147-157.
[6] Randic M, Zupan J, Balaban AT. Unique graphical representation of protein sequences based on
nucleotide triplet codons. Chem Phys Lett. 2004; 397: 247-252.
[7] Nandy A, Nandy P. On the uniqueness of quantitative DNA difference descriptors in 2D graphical
representation models. Chem Phys Lett. 2003; 368: 102-107.
[8] Liao B, Wang TM. Analysis of similarity of DNA sequences based on triplets. J Chem Inf Comp
Sci. 2004; 44: 1666-1670.
[9] Bai FL, Wang TM. A 2-D graphical representation of protein sequences based on nucleotide triplet
codons. Chem Phys Lett. 2005; 413: 458-462.
[10] Bai FL, Wang TM. The construction of phylogenetic tree by graphic representation of DNA
sequences. WSEAS Trans Inf Sci Appl. 2005; 2: 463-467.
[11] Bai FL, Wang TM. On graphical and numerical representation of protein sequences. J Biomol
Struc Dtn. 2006; 23: 537-545.
[12] Bai FL, Liu YZ, Wang TM. A representation of DNA primary sequences by random walk. Math
Biosci. 2007; 209: 282-291.
[13] Bai FL, Li DC,Wang TM. A new mapping rule for RNA secondary structures with its applications.
J Math Chem. 2008; 43: 932-943.
[14] Zupan J, Randic M. Algorithm for coding DNA sequences into \Spectrum-Like" and \Zigzag"
representations. J Chem Inf Model. 2005; 45: 309-313.
[15] Huang NE, Shen Z, Long SR et al. The empirical mode decomposition and the Hilbert spectrum
for nonlinear and non-stationary time series analysis. P Roy Soc Lond A. 1998; 454: 903-995.
[16] Huang NE, Wu ML, Qu WD. Applications of Hilbert-Huang transform to non-stationary financial
time series analysis. Appl Stoch Model Bus Ind. 2003; 19: 245-268.
[17] Bai FL, Zhang JH, Zheng JS. Similarity analysis of DNA sequences based on EMD method. Appl
Math Lett. 2011; 24: 232-237.
[18] Zhu SM, Yu ZG, Anh V, Yang SY. Analysing the similarity of proteins based on a new approach
to empirical mode decomposition. ICBBE. 2010; 6: 1-4.
[19] Wang ZH, Wang W, Li Y, Lin YW, Huang ZX. Sequence analysis and structure comparison of
cytochrome c proteins. China Journal of Bioinformatics. 2010; 8: 274-278.
[20] Zhao YB, Li XH, Qi ZH. Novel 2D graphic representation of protein sequence and its application.
Journal of Fiber Bioengineering and Informatics. 2014; 7: 23-33. |