Introduction: Manipulation of protein stability is important for understanding the principles that govern protein thermostability, both in basic research and industrial applications. Various data mining techniques exist for prediction of thermostable proteins. Furthermore, ANN methods have attracted significant attention for prediction of thermostability, because they constitute an appropriate approach to mapping the non-linear input-output relationships and massive parallel computing.Method: An Extreme Learning Machine (ELM) was applied to estimate thermal behavior of 1289 proteins. In the proposed algorithm, the parameters of ELM were optimized using a Genetic Algorithm (GA), which tuned a set of input variables, hidden layer biases, and input weights, to and enhance the prediction performance. The method was executed on a set of amino acids, yielding a total of 613 protein features. A number of feature selection algorithms were used to build subsets of the features. A total of 1289 protein samples and 613 protein features were calculated from UniProt database to understand features contributing to the enzymes’ thermostability and find out the main features that influence this valuable characteristic.Results:At the primary structure level, Gln, Glu and polar were the features that mostly contributed to protein thermostability. At the secondary structure level, Helix_S, Coil, and charged_Coil were the most important features affecting protein thermostability. These results suggest that the thermostability of proteins is mainly associated with primary structural features of the protein. According to the results, the influence of primary structure on the thermostabilty of a protein was more important than that of the secondary structure. It is shown that prediction accuracy of ELM (mean square error) can improve dramatically using GA with error rates RMSE=0.004 and MAPE=0.1003.Conclusion: The proposed approach for forecasting problem significantly improves the accuracy of ELM in prediction of thermostable enzymes. ELM tends to require more neurons in the hidden-layer than conventional tuning-based learning algorithms. To overcome these, the proposed approach uses a GA which optimizes the structure and the parameters of the ELM. In summary, optimization of ELM with GA results in an efficient prediction method; numerical experiments proved that our approach yields excellent results.Keywords: Protein Stability, Primary and secondary structures, Extreme learning machine, Neural networks, Genetic algorithm |
- Asial I, Cheng YX, Engman H, Dollhopf M, Wu B, Nordlund P, et al. Engineering protein thermostability using a generic activity-independent biophysical screen inside the cell. Nat Commun. 2013;4:2901.
- Chitturi B, Shi S, Kinch LN, Grishin NV. Compact Structure Patterns in Proteins. J Mol Biol. 2016 Aug 4.
- Kumwenda B, Litthauer D, Bishop OT, Reva O. Analysis of protein thermostability enhancing factors in industrially important thermus bacteria species. Evol Bioinform Online. 2013;9:327-42.
- Meysman P, Zhou C, Cule B, Goethals B, Laukens K. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData Min. 2015;8:4.
- Movahedi M, Zare-Mirakabad F, Arab SS. Evaluating the accuracy of protein design using native secondary sub-structures. BMC Bioinformatics. 2016;17(1):353.
- Pucci F, Dhanani M, Dehouck Y, Rooman M. Protein thermostability prediction within homologous families using temperature-dependent statistical potentials. PLoS One. 2014;9(3):e91659.
- Ebrahimi M, Ebrahimie E, Ebrahimi M, Deihimi T, Delavari A, Mohammadi-dehcheshmeh M. Application of neural networks methods to define the most important features contributing to xylanase enzyme thermostability. CEC 2009: IEEE Congress on Evolutionary Computation. 2009:18-21, 5-2891.
- Ebrahimi M, Ebrahimie E. Sequence-Based Prediction of Enzyme Thermostability Through Bioinformatics Algorithms. Current Bioinformatics. 2010;5(3):195-203.
- Satpathy R, Konkimalla V, Ratha J. Propensity based classification: Dehalogenase and non-dehalogenase enzymes. Journal of AI and Data Mining. 2015;3(2):209-15.
- Zhao W, Wang X, Deng R, Wang J, Zhou H. Discrimination of thermostable and thermophilic lipases using support vector machines. Protein Pept Lett. 2011 Jul;18(7):707-17.
- Ebrahimie E, Ebrahimi M, Deihimi T, Ebrahimi M. Using neural networks expert system to predict protein thermostability. 2011.
- Huang L-T, Wu C-C, Lai L-F, Gromiha MM, Wang C-S, Chen Y-R. Data mining application in biomedical informatics for probing into protein stability upon double mutation. Appl Math. 2014;8(1L):125-32.
- Zhang G, Fang B. Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins. Process Biochemistry. 2006;41(8):1792-8.
- Ebrahimi M, Lakizadeh A, Agha-Golzadeh P, Ebrahimie E. Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS One. 2011;6(8):e23146.15.
- Xu J, Chen Y. Discrimination of Protein Thermostability Based on a New Integrated Neural Network. 2011;7062:107-12.
- Wu L-C, Lee J-X, Huang H-D, Liu B-J, Horng J-T. An expert system to predict protein thermostability using decision tree. Expert Systems with Applications. 2009;36(5):9007-14.
- Szilagyi A, Zavodszky P. Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure. 2000;8(5):493-504.
- Vogt G, Woell S, Argos P. Protein thermal stability, hydrogen bonds, and ion pairs. Journal of Molecular Biology. 1997;269(4):631-43.
- Vogt G, Argos P. Protein thermal stability: hydrogen bonds or internal packing? Folding and Design. 1997;2:S40-S6.
- Gromiha MM, Suresh MX. Discrimination of mesophilic and thermophilic proteins using machine learning algorithms. Proteins. 2008 Mar;70(4):1274-9.
- Amini M, Rezaeenour J, Hadavandi E. Effective intrusion detection with a neural network ensemble using fuzzy clustering and stacking combination method. Journal of Computing and Security. 2015;1(4).
- Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1):489-501.
- Luo J, Vong CM, Wong PK. Sparse Bayesian extreme learning machine for multi-classification. IEEE Trans Neural Netw Learn Syst. 2014 Apr;25(4):836-43.
- Matias T, Araújo R, Antunes CH, Gabriel D, editors. Genetically optimized extreme learning machine. 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA); 2013: IEEE.
- Marvi H, Esmaileyan Z, Harimi A. Estimation of LPC coefficients using evolutionary algorithms. Journal of AI and Data Mining. 2013;1(2):111-8.
- Eftekhari M, Eftekhari M, Majidi M. Securing interpretability of fuzzy models for modeling nonlinear MIMO systems using a hybrid of evolutionary algorithms. Iranian Journal of Fuzzy Systems. 2012;9(1):61-77.
- Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006 Jul;17(4):879-92.
- Brown DK, Militzer W, Georgi CE. The effect of growth temperature on the heat stability of a bacterial pyrophosphatase. Archives of Biochemistry and Biophysics. 1957;70(1):248-56.
- Lauwers AM, Heinen W. Thermal properties of enzymes from Bacillus flavothermus, grown between 34 and 70 degrees C. Antonie Van Leeuwenhoek. 1983 Jun;49(2):191-201.
- Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999 Sep 17;292(2):195-202.
- Haupt RL. Antenna design with a mixed integer genetic algorithm. IEEE Transactions on Antennas and Propagation. 2007;55(3):577-82.
- Gohari M, Baghestani A, Purhosseigholi M, Orooji A. [Evaluation of parametric models with estimation of prediction error by the cross validation method in analyzing survival of colorectal patients]. Razi journal of Medical Science. 2016;23:45.
|