Research on Biomedical Engineering
Research on Biomedical Engineering
Original Article

Diabetes classification using a redundancy reduction preprocessor

Ribeiro, Áurea Celeste; Barros, Allan Kardec; Santana, Ewaldo; Príncipe, José Carlos

Downloads: 0
Views: 601


Introduction Diabetes patients can benefit significantly from early diagnosis. Thus, accurate automated screening is becoming increasingly important due to the wide spread of that disease. Previous studies in automated screening have found a maximum accuracy of 92.6%. Methods: This work proposes a classification methodology based on efficient coding of the input data, which is carried out by decreasing input data redundancy using well-known ICA algorithms, such as FastICA, JADE and INFOMAX. The classifier used in the task to discriminate diabetics from non-diaibetics is the one class support vector machine. Classification tests were performed using noninvasive and invasive indicators. Results: The results suggest that redundancy reduction increases one-class support vector machine performance when discriminating between diabetics and nondiabetics up to an accuracy of 98.47% while using all indicators. By using only noninvasive indicators, an accuracy of 98.28% was obtained. Conclusion: The ICA feature extraction improves the performance of the classifier in the data set because it reduces the statistical dependence of the collected data, which increases the ability of the classifier to find accurate class boundaries.


Diabetes, Clustering, Efficient coding, Independent Component Analysis, Support Vector Machine.


Amari S, Cichocki A, Yang HH. A new learning algorithm for blind signal separation. In: Mozer MC, Jordan MI, Petsche T, editors. Advances in Neural Information Processing Systems 9 (NIPS 1996). Cambridge: MITPress; 1996. p. 757-63

Baddeley R, Abbott LF, Booth MC, Sengpiel F, Freeman T, Wakeman EA, Rolls ET. Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal Society of London. Series B, Biological Sciences 1997; 264(1389):1775-83. PMid:9447735

Barros AK, Chichocki A. Neural coding by redundancy reduction and correlation. In: Proceedings of the VII Brazilian Symposium on Neural Networks; 2002 Oct 26-30, Salvador, Bahia: SBRN-IEEE. 2002. p. 223-6.

Bennet KP, Campbell C. Support vector machines: hype or hallelujah? ACM SIGKDD Explorations Newsletter. 2000; 2(2):1-13.

Blake CL, Merz CJ. UCI repository of machine learning databases [Internet]. 1996 [cited 2010 Aug]. Available from: http: //www.ics.uci.e.,du/~mlearn/MLRepository.html

Brasil. Ministério da Saúde. Departamento de Informática do SUS – DATASUS. SISHiperdia [Internet]. 2014. [cited 2014 Dec]. Available from:

Burges CJC. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 1998; 2(2):121-67.

Byeon B, Rasheed K, Doshi P. Enhancing the quality of noisy training data using a genetic algorithm and prototype selection. In: Proceedings of the 2008 International Conference on Artificial Intelligence; 2008 July 14-17, Las Vegas, Nevada. 2008. p. 821-7.

Çalişir D, Doğantekin E. An automatic diabetes diagnosis system based on LDA wavelet support vector machine classifier. Expert Systems with Applications 2011; 38(7):8311-5.

Cardoso JF, Souloumiac A. Blind beamforming for nongaussian signals. IEE Proceedings. Part F. Radar and Signal Processing 1993; 140(6):362-70.

Carvalho BPRD, Braga AP. IP-LSSVM: A two-step sparse classifier. Pattern Recognition Letters 2009; 30(16):1507-15.

Chang CC, Lin CJ. LIBSVM — A Library for Support Vector Machines [Internet]. 2003. [cited 2008 Jan]. Available from:

Chikh MA, Saidi M, Settouti N. Diagnosis of diabetes diseases using an Artificial Immune Recognition System2 (AIRS2) with fuzzy K-nearest neighbor. Journal of Medical Systems 2012; 36(5):2721-9. PMid:21695498

Comon P. Independent component analysis, a new concept? Signal Processing 1994; 36(3):287-314.

Costa DD, Campos LF, Barros AK. Classification of breast tissue in mammograms using efficient coding. Biomedical Engineering Online 2011; 10(55):2-14. PMid:21702953

DeWeese MR, Wehr M, Zador AM. Binary spiking in auditory cortex. The Journal of Neuroscience 2003; 23(21):7940-9. PMid:12944525

Dogantekin E, Dogantekin A, Avci D, Avci L. An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA-ANFIS. Digital Signal Processing 2010; 20(4):1248-55.

Doi E, Inui T, Lee TW, Wachtler T, Sejnowski TJ. Spatiochromatic receptive field properties derived from information-theoretic analyses of cone mosaic responses to natural scenes. Neural Computation 2003; 15(2):397-417. PMid:12590812

Ghazavi SN, Liao TW. Medical data mining by fuzzy modeling with selected features. Artificial Intelligence in Medicine 2008; 43(3):195-206. PMid:18534831

Hild KE 2nd, Erdogmus D, Torkkola K, Principe JC. Feature extraction using information-theoretic learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006; 28(9):1385-92. PMid:16929726

Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology 1962; 160(1):106-54. PMid:14449617

Hyvärinen A, Karhunen J, Oja E. Independent component analysis. John Wiley and Sons: 2001..

Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Networks 2000; 13(4-5):411-30. PMid:10946390

Jeatrakul P, Wong KW, Fung CC. Data cleaning for classification using misclassification analysis. Journal of Advanced Computational Intelligence and Intelligent Informatics. 2010; 14(3):297-302

Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Applications 2008; 35(1-2):82-9.

Kayaer K, Yildirim T. Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of Joint International Conference ICANN/ICONIP; 2003 June 26-29; Istanbul, Turkey. Springer; 2003. p.181-4.

Kung SY. Digital neural networks. 1th ed. Englewood Cliffs: Prentice Hall; 1993

Lee C-S, Wang M-H. A fuzzy expert system for diabetes decision support application. IEEE Transactions on Man and Cybernetics, Part B. 2011; 41(1):139-53.

Lekkas S, Mikhailov L. Evolving fuzzy medical diagnosis of Pima Indians diabetes and of dermatological diseases. Artificial Intelligence in Medicine 2010; 50(2):117-26. PMid:20566274

Li D, Liu C. A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Systems with Applications 2010; 37(4):3104-10.

Lucena F, Barros AK, Príncipe JC, Ohnishi N. Statistical coding and decoding of heartbeat intervals. PLoS ONE 2011; 6(6):e20227. PMid:21694763

Luukka P. Feature selection using fuzzy entropy measures with similarity classifier. Expert Systems with Applications 2011a; 38(4):4600-7.

Luukka P. Fuzzy beans in classification. Expert Systems with Applications 2011b; 38(5):4798-801.

Manevitz L, Yousef M. One-class SVMs for document classification. Journal of Machine Learning Research 2001; 2(2):139-54

Mat Isa NA, Mamat WMFW. Clustered-hybrid multilayer perceptron network for pattern recognition application. Applied Soft Computing 2011; 11(1):1457-66.

Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A. OP-ELM: optimally pruned extreme learning machine. IEEE Transactions on Neural Networks 2010; 21(1):158-62. PMid:20007026

Patil BM, Joshi RC, Toshniwal D. Hybrid prediction model for type-2 diabetic patients. Expert Systems with Applications 2010; 37(12):8102-8.

Polat K, Gunes S, Arslan A. A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Systems with Applications 2008; 34(1):482-7.

Polat K, Güneş S. An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digital Signal Processing 2007; 17(4):702-10.

Reddy MB, Reddy LSS. Dimensionality reduction: an empirical study on the usability of IFECF (independent feature elimination- by c-correlation and f- correlation) measures. International Journal of Computer Science. 2010; 7(1):74-81

Sample PA, Boden C, Zhang Z, Pascual J, Lee TW, Zangwill LM, Weinreb RN, Crowston JG, Hoffmann EM, Medeiros FA, Sejnowski T, Goldbaum M. Unsupervised machine learning with independent component analysis to identify areas of progression in glaucomatous visual fields. Investigative Ophthalmology & Visual Science 2005; 46(10):3684-92. PMid:16186350

Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Computation 2001; 13(7):1443-71. PMid:11440593

Scholkopf B, Smola AJ. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MITPress; 2002

Silva RN, Ferreira ACBH, Ferreira DD, Barbosa BHG. Non-invasive method to analyse the risk of developing diabetic foot. Healthcare Technology Letters. 2014; 1(4):109-13.

Simoncelli EP, Olshausen BA. Natural image statistics and neural representation. Annual Review of Neuroscience 2001; 24(1):1193-216. PMid:11520932

Smith EC, Lewicki MS. Efficient auditory coding. Nature 2006; 439(7079):978-82. PMid:16495999

Tran Q, Zhang Q, Li X. Evolving training model method for one-class SVM systems. In: SMC '03 Conference Proceedings – Proceedings of the 2003 IEEE Internation Conference on Systems, Man and Cybernetics; 2003 Oct 5-8, 2003; Washington. 2003 p. 2388-93.

Wang L. Datasets [Internet]. Vanderbilt University; 2014 [cited July 2014]. Available from:

Zhuang L, Dai H. Parameter optimization of kernel-based one-class classifier on imbalance learning. Journal of Computers 2006; 1(7):32-40.
5889fbf15d01231a018b4877 rbejournal Articles
Links & Downloads

Res. Biomed. Eng.

Share this page
Page Sections