Lung sound classification using local binary pattern

Lung sound classification using local binary pattern

Nandini Sengupta Md Sahidullah Goutam Saha Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, India, Kharagpur-721 302. Speech and Image Processing Unit, School of Computing, University of Eastern Finland, Joensuu, Finland, Finland-80101.

Lung sounds contain vital information about pulmonary pathology. In this paper, we use short-term spectral characteristics of lung sounds to recognize associated diseases. Motivated by the success of auditory perception based techniques in speech signal classification, we represent time-frequency information of lung sounds using mel-scale warped spectral coefficients, called here as mel-frequency spectral coefficients (MFSCs). Next, we employ local binary pattern analysis (LBP) to capture texture information of the MFSCs, and the feature vectors are subsequently derived using histogram representation. The proposed features are used with three well-known classifiers in this field: k-nearest neighbor (kNN), artificial neural network (ANN), and support vector machine (SVM). Also, the performance was tested with multiple SVM kernels. We conduct extensive performance evaluation experiments using two databases which include normal and adventitious sounds. Results show that the proposed features with SVM and also with kNN classifier outperform commonly used wavelet-based features as well as our previously investigated mel-frequency cepstral coefficients (MFCCs) based statistical features, specifically in abnormal sound detection. Proposed features also yield better results than morphological features and energy features computed from rational dilation wavelet coefficients. The Bhattacharyya kernel performs considerably better than other kernels. Further, we optimize the configuration of the proposed feature extraction algorithm. Finally, we have applied mRMR (minimum-redundancy maximum-relevancy) based feature selection method to remove redundancy in the feature vector which makes the proposed method computationally more efficient without any degradation in the performance. The overall performance gain is up to as compared to the standard wavelet feature based system.

Artificial Neural Network (ANN), Auscultation, k-Nearest Neighbor (kNN), Local Binary Pattern (LBP), Mel-frequency Spectral Coefficients (MFSCs), Support Vector Machine (SVM), Texture Features.

1 Introduction

Lung disease is the third largest cause of death in the world. According to the World Health Organization (WHO) in 2015, 3.17 million people died due to chronic obstructive pulmonary diseases (COPDs), 3.19 million people died because of lower respiratory infections and 1.69 million people died due to trachea bronchus, lung cancer111 Various physiological and external reasons cause structural changes and abnormalities in the respiratory system. Those abnormalities can be detected through several techniques such as X-ray (wipf1999diagnosing, ), CT-scan, arterial-blood gas analysis (gould1991lung, ), spirometry, peak-flow meter, stethoscope etc.

Though these processes are good to get information about lung status, they suffer from several practical problems. First of all, these techniques are not widely accessible. Other than availability, the techniques themselves are not without limitations. Arterial blood gas analysis is invasive and expensive; CT-scan or X-ray radiation is harmful to the body, spirogram (graph generated by spirometry) depends on the cooperation and effort of subjects (schulz2007spirometry, ), and peak expiratory flow rate also depends on the subject’s cooperation. The auscultation method through stethoscope has several limitations too due to several issues like (a) a lack of experience of the physician leading to his inability to identify the lung sound abnormalities and impairments, and (b) low sensitivity of the human ear to the lower frequency band present in lung sound (Kandaswamy2004523, ). The sound has also been captured from mouth using microphone (lei2014content, ). Now the low cost, non-invasive and easily-available stethoscope-based technique with computerized assistance removes the manual interference and makes it automatic by employing machine learning techniques.

In (sengupta2016lung, ), there is a discussion about common types of lung sounds - normal, continuous adventitious sounds (CASs) (e.g., wheeze) and discontinuous adventitious sounds (DASs) (e.g., crackle) and their spectral characteristics (NEJMra1302901, ). In short, normal sounds are related to the lung sounds of a healthy subject. The normal sound becomes adventitious when it is superimposed with other sounds due to different types of abnormalities or diseases in the respiratory system. Out of several types of lung diseases, basic two types of diseases are airway related diseases and lung tissue related diseases. Airway related diseases cause obstruction or blockage in the airways. A common symptom of obstructive diseases (e.g., asthma, COPD, etc.) is wheeze sound. Other than asthma and COPD, infections such as croup, whooping cough, laryngitis, acute tracheobronchitis, tracheal stenosis, laryngeal stenosis and airway compression are also associated with wheezing sound (meslier1995wheezes, ). Besides, in case of lung tissue related diseases, scarring or inflammation of the tissue makes the lungs unable to expand fully. Thus, it becomes hard for the lungs to take in oxygen and release carbon dioxide. Interstitial lung disease (ILD) is synonymous with diffuse parenchymal lung disease, for a large group of lung diseases affecting the tissue and space around the air sacs or alveoli of the lungs, which cause progressive scarring of lung tissue through inflammation and fibrosis (zibrak2014interstitial, ). One of the physical signs in ILD is crackles (piirila1995crackles, ; charleston2011assessment, ). In this work, we focus on characterizing normal and abnormal sounds. We conduct experiments with two databases where the abnormal sounds of the first database having wheeze and crackle type lung sounds. On the other hand, the abnormal lung sounds are recorded from ILD patients in the second database.

As a typical pattern recognition problem, the automatic lung sound recognition system involves three different steps: pre-processing, feature extraction and classification. In the pre-processing step, the signal recorded using stethoscope is processed for heart sound effect reduction (sengupta2016lung, ). In the feature extraction step, lung sound signal is represented by its features defining a set of measured values with distinctive information. Finally, in the classification step, the extracted features are categorized into normal and different diseased classes. Various feature extraction and classification techniques have been used to investigate the distinctive characteristics of lung sound. A detailed study of features and classification methods have been reported in (Palaniappan2013129, ). Different features are extracted after analyzing the sound characteristics with different analysis methods described in (reichert2008analysis, ). Both time and frequency domain features are used in lung sound detection. According to the previous works in this research area, the most popular features are based on wavelets (serbes2011feature, ; pittner1999feature, ; lu2008integrated, ; lin2006wheeze, ). They captures useful time-frequency information from the non-stationary lung sounds. Along with the wavelet-based features, spectral features are also widely used in lung sound characterization (Abbas2010Auto, ; riella2009method, ; Xie2012MSPCA, ; jin2011adventitious, ; rietveld1999classification, ; waitman2000representation, ; munakata1991spectral, ; jin2011adventitious, ). Auto-regressive (AR) coefficients, related to the linear prediction (LP) analysis have been used for this purpose (sankur1994comparison, ; alsmadi2008design, ; Chang2010141, ; martinez2006computerized, ; charleston2011assessment, ; cohen1984analysis, ; charleston2011assessment, ). Cepstral features such as mel-frequency cepstral coefficients (MFCCs) (sahidullah2012design, ) and perceptual linear prediction cepstral coefficients (PLPCCs) are motivated by auditory perception and are employed for this task (Bahoura2009824, ; orjuela2014artificial, ; chien2007wheeze, ; bahoura2004respiratory, ; mayorga2010acoustics, ; lei2014content, ; sengupta2016lung, ; sengupta2015optimization, ). The auditory perception based features, which are expected to capture important distinctive characteristics of different lung sounds similar to an expert physician, show considerable lung sound recognition accuracy (sengupta2016lung, ; Bahoura2009824, ; lei2014content, ). Besides, diseases like ILD which actually represents a group of lung diseases has also been detected by analyzing different features captured from lung sounds (charleston2011assessment, ; palaniappan2014pulmonary, ).

In our recent work, we have proposed a mean-based representation of perceptual features for lung sound classification that captures statistical information of cepstral coefficients (sengupta2016lung, ). In this work, we investigate a new method by texture-based representation of perceptual features. In image processing, texture-based approaches are widely used for representation and classification of natural images (tuceryan1993texture, ). The texture was also introduced by Ren Laennec, the inventor of the stethoscope, as a set of descriptors for the characterization of lung sounds (laennec1838treatise, ). Influenced by the idea of Lanec, lacunarity feature is utilized in (hadjileontiadis2009texture, ). Lacunarity is one kind of texture feature and it yields good recognition accuracy (hadjileontiadis2009texture, ). Other than lacunarity, different morphological features like skewness, sample entropy are also employed for lung sound detection (bhattacharyya2015novel, ; jin2014new, ). But, these texture features are generally extracted directly from time domain lung sound signal. Now, we can assume that lung sound may also have some texture characteristics in the frequency domain.

In this work, we investigate a new feature for lung sound characterization which captures texture-related information from its auditory perceptual representation. The features are extracted from the short-term spectral coefficients using local binary pattern (LBP) (ojala2002multiresolution, ; huang2011local, ; pietikainen2011local, ; shan2009facial, ; shan2005robust, ; liao2009dominant, ; heikkila2009description, ). LBP is widely used in image processing literature for texture analysis. In speech signal processing, LBP was used to capture texture information from speech signal for spoofing detection (alegre2013new, ; wu2015spoofing, ), speaker recognition (roy2012fast, ). We introduce this first for lung sound characterization. In this paper, LBP is used to take the texture information out from the auditory scale warped lung sound spectrum. We utilize kNN classifier for this purpose. We also use the extracted features with conventional artificial neural-network (ANN) as it is popularly used for lung sound characterization (Palaniappan2013129, ; Kandaswamy2004523, ). We further employ support vector machine (SVM) as a classifier as the LBP-based feature yields better performance with SVM classifier (chapelle1999support, ; wong2006application, ). Other than the simple linear inner product kernel, we also have used Bhattacharyya (kondor2003kernel, ) and intersection kernel (maji2013efficient, ) which seem to be more appropriate for the newly investigated feature. We have found that the proposed feature performs better than the wavelet-based, morphological and our previously proposed mean-based MFCC features.

The rest of the paper is organized as follows. At first, in Section 2, extraction of suitable feature for lung sound analysis and formulation of the proposed feature are discussed. After that, the basic mathematical background of used classifiers are noted down in Section 3. Experimental setups are described in Section 4. Results are discussed in Section 5. In Section 6, we provide a summary of the findings along with the limitations of the current work and potential future directions.

2 Local binary pattern (LBP) based feature for lung sound analysis

Lung sound signals are not stationary as the volume of the lung is varying in nature due to change in air pressure and its amount (reichert2008analysis, ). The non-stationarity is severe in case of abnormal subjects (Kandaswamy2004523, ). Hence, the time-frequency representation of lung sounds can capture useful information related to lung status. In our previous study (sengupta2016lung, ), by performing F-ratio analysis on short-term power spectrum, we have shown that the power spectrum contains discriminative information about different lung sounds. Subsequently, we had proposed to use the statistical information of short-term features for lung sound characterization. In this current work, we have investigated a LBP-based approach to process the short-term time-frequency representation of lung sounds. LBP is a non-parametric descriptor, which summarizes the local structure of an image by comparing each pixel with its neighboring pixels. The computational simplicity is one of the main strengths of LBP (huang2011local, ), and it is a powerful approach to describe local structures. Here, we first describe the calculation of LBP which is followed by the feature vector formulation from LBP.

2.1 Mel-frequency spectral coefficients (MFSCs)

Proposed feature is formulated by considering several factors discussed earlier. Assuming the non-stationary nature of lung sound, we have used short-time processing (sengupta2016lung, ). Secondly, considering the success of the auditory perceptual processing in lung sound classification purpose (Bahoura2009824, ; sengupta2016lung, ), mel-scale warping is applied on short-time power spectrum. Now the mel-scale warped short-term representation can be viewed as a 2D image whose dimensions are represented by time (or number of frames) and frequency bins (or number of filters). Finally, we can use this image for extracting relevant texture information using LBP. Note that we do not need to apply discrete cosine transform (DCT) to extract cepstral features from mel-scale warped power spectrum. Rather, we compute LBP directly on the spectrum, i.e., mel-frequency spectral coefficients (MFSCs). This MFSC is same as mel-frequency log-energy or MFLE as discussed in (sahidullah2012design, ).

In order to compute MFSCs, short-term power spectrum is calculated from lung sound frame, , using -point FFT as:


where, and is the window length. Next, filterbank with non-linearly spaced triangular mel-filters is imposed on the spectrum. Mel-scale is defined as,


where is the original frequency in Hz.

The outputs of the mel-filters can be calculated by a weighted summation between respective filter response and the energy spectrum as:


Here, is the number of filters in filter bank. Finally, filterbank energies is computed and we get the mel-frequency spectral coefficients.

2.2 Computation of LBP

The original LBP operator labels the image pixels with decimal numbers (ojala2002multiresolution, ). Each pixel is compared with its neighboring pixels primarily with neighborhood by subtracting the center pixel value (alegre2013new, ). The resulting strictly negative values are encoded with and the other with . At first, it was only for neighborhood which is small enough for large images. But then it was generalized with the neighbor of different sizes. Formally, let a pixel at , then the LBP can be expressed in decimal form as,


where and are intensities (gray values) of -th pixel and center pixel. represents a neighbourhood of sampling points on a circle of radius . function can be defined as,


The operator produces different binary patterns formed by pixels in the neighborhood.

2.3 Feature vector formulation using LBP

The steps of feature extraction are summarized in Fig. 1. Here, MFSCs are computed from short-time power spectrum of frames using filters. The frame-wise concatenated MFSCs are considered as image and LBP analysis is employed on it. The standard LBP operator (ojala2002multiresolution, ) is a non-parametric, kernel which assigns a binary code to each pixel in an image according to the comparison of its intensity value to that of its eight surrounding pixels. A binary value of ‘1’ is assigned when the intensity of neighboring pixels (i.e., MFSC feature) is higher, whereas a value of ‘0’ is assigned when neighboring pixels are of lower or equal intensity. Each pixel is thus assigned one of binary patterns. In this work, we reduce the number of possible patterns according to the standard Uniform LBP approach reported in (ojala2002multiresolution, ). Uniform LBPs are the subset of patterns which incorporate at most two bitwise transitions from or when the bit pattern is traversed in circular fashion.

Figure 1: An illustration of the processing steps for proposed feature extraction method: from the lung cycle to histogram.

Figure 2: Feature formulation steps showing spectrogram (top row), MFSCs and LBP (second from below) for normal (left), wheeze (center) and crackle (right). The last row shows the corresponding feature vector.

Figure 2 gives the pictorial representation of the main steps of proposed feature formulation. The left column is the representation for normal sound; the middle column is of wheeze sound, and the column on the right represents the steps for crackle sound. The first row (upper one) illustrates the spectrograms for normal, wheeze and crackle sound which makes it clear that crackle has higher frequency components than normal or wheeze and normal lung sound has a narrower frequency range than other two. In the middle row, we have plotted MFSCs. Its x-axis represents the frames and y-axis represents the coefficients. This MFSC domain is a linear mapping from spectrogram. The later is more compact than the spectrogram plot as the number of filters is less than the number of frequency points in power spectrum computation. Then, the third row is a mapping from MFSCs to LBP domain. Finally, we have plotted LBP histogram features for each of the classes.

As per the description in (ojala2002multiresolution, ; alegre2013new, ), most patterns are naturally uniform and according to the different pieces of evidence, many image recognition applications lead to better performance by using only uniform patterns than the full set of uniform and non-uniform patterns. Thus pixels corresponding to any of the 198 non-uniform patterns are ignored in our work too. LBPs are computed for each pixel in the mel-frequency spectral coefficients. Therefore, a new matrix referred to as a textrogram which is of the reduced dynamic range is generated. The LBP-based feature is created by concatenating histograms configured from the pixel values across each filter in the textrogram. However, the textrograms corresponding to the filters in beginning and end are discarded as they do not have all the neighbors. The LBP feature vector is formed by normalization of the histograms individually.

In our work, we have generally used filters in mel scale. Therefore, unless otherwise specified, the feature dimension is .

Figure 3 shows an illustration of scatter plot of feature vectors using wavelet-based and LBP-based approach for normal and abnormal lung sounds. Two-dimensional data representation is produced using t-distributed stochastic neighbor embedding (t-SNE) (maaten2008visualizing, ) algorithm applied to feature vector of each lung sound cycle. Two classes of sounds are visually more separable when LBP-based features are used. Also, we have done MANOVA analysis to figure out the separability between the classes. The Wilk’s lambda values of the databases are shown in Table 1. Null hypothesis is closer to rejection when Wilk’s lambda is close to zero. Thus, it can be inferred that classes are more separable when LBP based feature is used.

Figure 3: Scatter plot showing the normal (green circle) and abnormal sounds (red triangle) for all the lung sound cycles in Database 1 (Details in Section 4.1).
Feature Database 1 Database 2
Wavelet 0.4923 0.1652
LBP 0.1507 0.0501
Table 1: MANOVA analysis: Wilk’s lambda value (p-value ) of corresponding databases are represented.

3 Classifiers for lung sound classification

3.1 k-nearest neighbour(kNN)

kth nearest neighboring (kNN) is a nonparametric classifier. To demonstrate a k-nearest neighbor analysis, let’s consider the task of classifying an unknown object among some known objects. The training stage of the kNN algorithm comprises of storing the feature vectors and class labels of the training samples. In the classification stage, the same features are used as before for the the unknown test sample. Distances from the unknown input vector to all stored vectors are computed, and k closest samples are selected. The unknown sample is predicted to belong to the particular class that is the most numerous within the set. The Euclidean distance is typically used to measure the distance or similarity between instances. However, other distance functions can also be employed for this purpose. Figure 4 depicts a simple representative figure of a nearest neighbor classifier.

Figure 4: A representative diagram of kNN classifier.

3.2 Artificial neural network (ANN)

The ANN used here is a multi-layer perceptron (MLP) which is a number of neurons (nodes), arranged together in layers in feed-forward manner (SimonHaykin, ). It consists of three layers – input, hidden and an output layer. Inputs pass through first two layers and finally emerge from the output layer. Figure 5 represents an example of an MLP network characterized by one input layer, one hidden layer, and one output layer.

Figure 5: A representative diagram of ANN classifier.

Each node, in the hidden layer, receives the output of each node from the input layer through a connection of weight and produced response is forwarded to the output layer. Each node of hidden layer performs a weighted sum which is transferred by a nonlinear function and its results proceeds to the output layer.


where, is the produced response of -th node of hidden layer, is the non-linear function at the hidden layer node, is number of nodes in input layer, is the weight connecting -th input node and -th hidden layer node, is the i-th input feature of input feature vector, is the bias term.

Same ways, response of hidden layer passes through another non-linear function after multiplied by the weights of the output layer.


where, is the produced response of -th node of output layer, is the non-linear function at the output layer node, is number of nodes in the hidden layer, is the weight connecting -th hidden node and -th output node and is the bias term. ANN is a supervised classifier and weights are determined in training phase. Generally, back propagation (BP) algorithm is used. If t-th pattern is presented by where is -th input pattern and is the desired output or class label. The total error in training defined as


where and are the desired and actual output of -th pattern at -th output node, represents the nodes in the output for a given training pattern. The error is minimized by updating the weights using the gradient descent rule:


where is the learning rate. A small value of can guarantee convergence but makes learning slow. On the other hand, a large value of involves a rapid learning but can lead to oscillation or even divergence. To overcome this limitation, many variations of this algorithm have been introduced for training neural networks. Other than gradient decent based BP algorithm, adaptive learning rate BP, resilient BP, Levenberg–Marquardt, and scaled conjugate gradient BP algorithms are also used for this purpose (Kandaswamy2004523, ). In our work, we have used Resilient back propagation algorithm for training the neural network.

3.3 Support vector machine (SVM)

The SVM is a supervised machine learning technique which is based on guaranteed risk bounds of statistical learning theory known as structural risk minimization (SRM) principle (ari2010detection, ) and it is used for both classification (ari2010detection, ) and regression (smola2004tutorial, ).

Let be the training examples where is the -th feature vector of dimension and is the label (target output) of . The hyper plane that separates the classes is represented as,


where is the weight vector perpendicular to the separating hyperplane, is a scalar bias term which determines the position of hyper plane in dimensional space. Equation 10 is used as inequality in the following way to separates both the classes.


Figure 6: An illustration of SVM classifier.

Now, after including the class information, the above equations will be:


This is for linearly separable condition. However, if a separating hyperplane does not exist in the feature space, slack variables, , are introduced such that (ari2010detection, ),


Now, to find the optimal separating hyperplane, the risk bound is minimized according to the SRM principle by the following optimization problem,


where, the parameter is the regularization parameter that balances the importance between the maximization of the margin width and the minimization of the training error. The solution of quadratic optimization problem in Eq. 14, is obtained by finding the saddle point of the Lagrange function (Ghorai2009510, ),


where, is the Lagrange multiplier of the -the data point. Note that s are non-zero only for the support vectors, i.e., data points in the margin.

Now, the final form of the decision function that we get is as following,


where, represents the kernel function. In our work, we first use a linear inner product kernel which is defined by,


In order to improve the separability of the classes, the input features are first transformed into a different domain using linear and nonlinear transformation techniques. During SVM training, maximum-margin hyperplane is obtained in the transformed feature space. In this work, along with the simple linear kernel that does not require any feature mapping, we have used Bhattacharyya (kondor2003kernel, ; lee2011using, ) and intersection (maji2013efficient, ; maji2008classification, ). We select this two kernels because these are useful when the input feature is normalized histogram similar to our proposed LBP-based feature. These two kernels measure the similarity between two distributions.

Bhattacharyya kernel: When is non-negative and it represents a normalized histogram, the Bhattacharyya kernel uses Bhattacharyya coefficient to measure the similarity, and it is defined as (kondor2003kernel, ),


Intersection kernel: Histogram intersection kernel measures the intersection of the two normalized histograms as similarity and it is defined as (maji2013efficient, ),


4 Experimental setup

Database Sound No. of Cycles
Database 1 Normal 24
Abnormal 48 (Wheeze and Crackle)
Database 2 Normal 40
Abnormal 40 (ILD)
Table 2: Description of the lung sound databases used in our experiments.

4.1 Database

Systematic collection of lung sound samples with the reliable ground truth is an important requirement of this research. Our database consists of recorded lung sounds obtained from three different resources: RALE database222, Audio and Bio-signal Processing Lab (IIT Kharagpur)333, Institute of Pulmocare and Research (Salt Lake, Kolkata)444 The sampling frequency is Hz in all the two cases. The recordings were done in the anterior suprasternal notch positions and trachea of the subjects using a single channel data acquisition system described in (mondal2011reduction, ). Subjects were in sitting position and at relaxing mood during recording, and to reduce man made artifacts, stethoscope was tied with a tape on the recording site. Subjects of various age groups were involved in the recording. The signals were verified by experts. In our study, we have used two databases. The first database is categorized differentiating common three types of lung sounds. On the other hand, the sounds of other database are from subjects of ILD diseases.

  • Database 1: Database 1 consists of three types of lung sounds from 30 subjects - normal, crackle and wheeze. It is having 72 cycles (24 from each class) for our experiments.

  • Database 2: This database includes two types of lung sounds 1) Normal and 2) ILD sound cycles. Five cycles are collected from each of eight normal subjects and eight ILD subjects.

4.2 Pre-processing and feature extraction

In the pre-processing step, after reducing the effect of heart sounds (mondal2011reduction, ), the lung sound cycles were extracted using a Hilbert envelope based algorithm (mondal2014detection, ) in the same manner as in (sengupta2016lung, ). Unlike the extracted cycles in (lozano2016automatic, ), all the normal cycles were recorded from normal subjects where adventitious sounds were recorded from abnormal subjects.

Then, the sound cycles are down-sampled at Hz because lung sounds have most of the relevant information with in 2000 Hz. Amplitude normalization is performed on cycle-by-cycle basis. For short-term feature extraction, we have used Hamming window and frame-length of 20 ms having 50% overlap with the adjacent frames (sengupta2016lung, ). MFSCs are computed using 20 mel filters, and from concatenated MFSCs, textrogram is computed using uniform LBP pattern. Note that MFCC features can be derived directly from MFSC by applying DCT.

We have also implemented four existing methods: wavelet-based (Kandaswamy2004523, ), MFCC-based (sengupta2016lung, ), morphological (mondal2014detection, ) and rational dilation wavelet based (ulukaya2016lung, ) technique. Wavelet coefficients are calculated using Daubechies mirror filters of order 8 (db8) with six levels of decompositions, encompassing a frequency range of - Hz. Feature vectors are computed from those coefficients of the decomposed subbands. Mean of the absolute values of the coefficients in each subband, the standard deviation of the coefficients in each subband, average power of the coefficients in each subband and ratio of computed means of adjacent subbands are calculated to formulate 27-dimensional feature vector. On the other hand, mean of conventional MFCC features, computed over all the frames of a lung sound cycle, are used to formulate a 20-dimensional MFCC-based feature. In case of morphological features, four features, i.e., kurtosis, skewness, lacunarity, and sample entropy values are computed from time domain lung sound cycles. Therefore, the feature vector dimension is four here. Now, in case of rational dilation wavelet based features, energy features that yielded best results as reported in (ulukaya2016lung, ) are computed from high Q-factor wavelets where dilation factor is 1.17. The dimension of the feature vector is 31.

4.3 Classifier

k-Nearest Neighbor (kNN): We have used MATLAB function of kNN classifier. The distance metric used here is euclidian distance. The number of neighbors is different for different features. It is chosen where the best result is obtained for a feature.

Artificial Neural Network (ANN): Configuration of ANN classifier is same as in (Kandaswamy2004523, ; sengupta2016lung, ). It has one hidden layer, and the hidden layer is consists of 40 neurons. Activation function of hidden layer and output layer is tan sigmoid and log sigmoid, respectively. Resilient back propagation (RP) is used for training as found efficient in lung sound classification (Kandaswamy2004523, ). Average accuracy is calculated by iterating the classification method 25 times.

Support Vector Machine (SVM): We have used LIBSVM library555 functions for SVM implementation. Linear, Bhattacharyya and intersection kernels are utilized in our study. Intersection kernel was found to perform better and computationally efficient in (maji2008classification, ). We have used SVM penalty factor (related to the margin between hyper planes) in all the experiments. In case of the morphological feature, the best results were yielded for RBF kernel as reported in (mondal2014detection, ). Thus, for morphological features only, we have used RBF kernel.

4.4 Performance evaluation

For performance evaluation, we have adopted Leave-one-out cross validation method. In Database 1, we test one cycle at a time, and others are used as the training sample. For Database 2, we have tested cycles of one subject at a time, and cycles of other subjects are used for training. We report the classification performance using three different metrics: specificity (SPE), sensitivity (SEN) and overall accuracy (OAA). Specificity is the proportions of normal cycles that are correctly identified as normal. Sensitivity is the proportions of abnormal cycles that are correctly identified as abnormal. Finally, overall accuracy measures the number of correctly classified normal and abnormal sound cycles with respect to the total number of test samples. Those are defined below,

5 Results and discussions

5.1 Performance comparison of baseline and proposed features

In this section, we evaluate the proposed features for lung sound classification. We also evaluate wavelet-based (Kandaswamy2004523, ), MFCC-based (sengupta2016lung, ), morphology-based (mondal2014detection, ) and energy coefficients from rational dilation wavelet features (ulukaya2016lung, ). In addition, we have evaluated MFSC-based mean features. Table 3 describes results of lung sound classification accuracy with kNN as a classifier. Here, we use euclidean distance for similarity measure. In case of the second database, MFCC and MFSC based features perform better than LBP. But in case of Database 1, LBP performs better than other features. Lung sound classification performance on two databases using MLP-based ANN classifier are shown in Table 4. In most of the cases, MFCC-based mean features outperform all other methods. LBP gives poorer accuracy compared to MFCC and MFSC features. The possible reason is that this LBP histogram based features are not suitable with ANN classifier. Also, we have used the same configuration of ANN that was used in (Kandaswamy2004523, ). Performance may improve if we further optimize configuration parameters of the classifier. Morphological features and energy values of rational dilation wavelet coefficients do not perform as well as MFCC, MFSC or wavelet based features in ANN classifier with the specified configuration.

Feature Database 1 Database 2 SPE SEN OAA SPE SEN OAA Wavelet (Kandaswamy2004523, ) 79.16 95.83 90.27 92.50 95.00 93.75 MFCC (sengupta2016lung, ) 95.83 95.83 94.44 95.00 100.00 97.50 MFSC 95.83 95.83 94.44 95.00 100.00 97.50 Morphological features (mondal2014detection, ) 87.50 79.17 77.78 85 67.50 76.25 Energy (ulukaya2016lung, ) 91.67 95.83 93.05 82.50 100.00 91.25 LBP 100.00 97.91 98.61 90.00 100.00 95.00

Table 3: Lung sound classification accuracy (in %) using existing and proposed features with KNN classifier.

Feature Database 1 Database 2 SPE SEN OAA SPE SEN OAA Wavelet (Kandaswamy2004523, ) 82.83 95.50 91.16 94.30 97.00 95.65 MFCC (sengupta2016lung, ) 97.50 97.41 97.22 92.70 99.70 96.20 MFSC 99.50 97.41 97.16 88.40 99.60 94.00 Morphological features (mondal2014detection, ) 82.17 90.67 86.33 83.10 75.10 79.10 Energy (ulukaya2016lung, ) 87.50 89.58 88.89 85.00 95.00 90.00 LBP 85.00 93.00 89.27 74.00 81.30 77.65

Table 4: Lung sound classification accuracy (in %) using existing and proposed features with ANN classifier.

Feature Kernel Database 1 Database 2 Type SPE SEN OAA SPE SEN OAA Wavelet (Kandaswamy2004523, ) Inner Product 83.33 77.08 79.16 92.50 92.50 92.50 MFCC (sengupta2016lung, ) Inner Product 58.33 97.91 75.00 87.50 100.00 93.75 MFSC Inner Product 91.67 97.91 94.44 87.50 100.00 93.75 Morphological features (mondal2014detection, ) RBF 91.67 95.83 91.67 77.50 67.50 72.50 Energy (ulukaya2016lung, ) Inner Product 89.67 97.50 92.33 87.70 99.80 93.75 LBP Inner Product 95.83 97.91 97.22 85.00 100.00 92.50 Bhattacharyya 95.83 97.91 97.22 90.00 100.00 95.00 Intersection 100.00 97.91 98.61 85.00 100.00 92.50

Table 5: Lung sound classification accuracy (in %) using existing and proposed features with SVM classifier using different kernels.

Table 5 describes results of lung sound classification accuracy with SVM as a classifier. We have used linear inner product kernel as a default system. Other than this, we have used two different kernels suitable for the proposed feature. The results shown in Table 5 indicate that LBP-based features are more suitable with SVM classifier than the ANN-based system as shown in Table 4. We have found that MFSC-based mean features are better or equally good as MFCC-based mean features. The LBP-based feature with linear inner product kernel yields considerable performance in terms of sensitivity. However, its performance is not so better in terms of specificity when compared with the MFSC-based mean feature. But with Bhattacharyya kernel, its specificity is improved, particularly for Database 2. The intersection kernel also shows better specificity. But this is not consistent for both the databases. For instance, LBP-features with both Bhattacharyya and intersection kernel show better performance than popular wavelet-based features in the first database. LBP features yield better results than morphological features too in both the databases. Thus, it is found that the performance of proposed features is not only performed better than others in kNN classifier but also in SVM classifier. However, for Database 2, the wavelet-based feature is still giving the best performance in terms of specificity even with ANN as a back-end. From pathology perspective, sensitivity is more important than specificity in order to ensure that a case must not be “missed”. The proposed features are more suitable from this viewpoint.

5.2 Further optimization of different parameters

In the previous section, we have shown the recognition performance with an arbitrarily chosen configuration of spectral coefficients which is popularly used in speech processing context. The same was also used in our previous work on lung sound characterization (sengupta2016lung, ). In this section, we further optimize the configuration parameters of the proposed feature for lung sound detection. The performance evaluations in the previous section indicate that the proposed feature outperforms others not only with kNN classifier but also with Bhattacharyya kernel based SVM classifier. We choose LBP with Bhattacharyya kernel for the rest of the analyses. Moreover, as the first database (i.e., Database 1) includes the most common sounds that may present in healthy and different diseased subjects, all optimization experiments are first conducted on this data. After that, the optimized parameters are used to evaluate the accuracies on the other database.

Figure 7: Effect of frame size on lung sound recognition accuracy (%) for proposed LBP feature with Bhattacharyya kernel and SVM classifier on Database 1.

5.2.1 Optimization of frame length

From speech signals, the short-term spectral features are generally extracted with a frame-length of 20 ms. Here, we analyze how the variations of frame length during the computations of MFSCs influence the lung sound classification accuracy. We conduct experiments by varying the frame length between 20 ms and 200 ms. The results are shown in Fig. 7. We get best overall accuracy of 98.61% at 30 and 40 ms frame length. As mentioned before, the default frame size of 20 ms is motivated from the speech processing application, it is not necessarily the optimum choice for lung sound analysis. We have further checked the accuracy of the proposed feature using other two kernels, we also get optimum performance at ms. Besides, from Fig. 7, we infer that the accuracy of abnormal sound detection is less affected with the variation of frame length. On the other hand, we notice that the accuracy of normal sound detection is considerably dependent on the frame size. Interestingly, the specificity decreases with the increase of frame length. We have chosen 40 ms frame length for our next experiments with other database. In spite of change in frame-length in Database 2, the accuracies remain same for proposed feature as with 20 ms frame length (Table 5). Even though, the abnormal sounds are accurately detected with 100% sensitivity, the normal sounds are not detected properly. The variation in specificity is noticed among the databases. This could be explained by several factors that affects the characteristics of normal lung sounds (reichert2008analysis, ).

5.2.2 Optimization of frame overlap

In the next analysis, we have varied the overlap in framing and observed its effect on the recognition performance. So far, we have used 50% overlap. Here, we have varied it for 10-90%. In Fig. 8, we observe that the sensitivity is not changes over all the considered overlaps, and the variation in overall accuracy is mostly due to the detection accuracy of normal sounds. We notice that the specificity attains 100% but it is not consistent. When percentage of overlap is set at 80%, accuracies of both normal and abnormal sound become stable, and the best overall accuracy (i.e., 98.61%) is achieved. Accuracies of LBP for all the three kernels saturate at percentage of overlap 90%. Therefore, instead of taking 80% as the optimum, we consider percentage of overlap as 90% as the optimized value. In case of Database 2, we observe 2.5% improvement in overall accuracy due to higher specificity in the optimized frame-overlap.

Figure 8: Effect of percentage of frame-overlap on lung sound recognition accuracy (%) for proposed LBP feature with Bhattacharyya kernel and SVM classifier on Database 1.

5.2.3 Optimization of filter numbers

In previous experiments, we have considered 20 filters in filterbank which is frequently used in speech processing. Here, we have varied the number of filters from 10 to 90 and observed its impact on the performance. The performances are illustrated in Fig. 9. We notice that the performance is poor when the number of filters is less than 10. Surprisingly, the accuracy of abnormal sound detection, i.e., sensitivity, remains quite unchanged with the variation in number of filters. On the other hand, normal accuracy seems to be more dependent on the filter number. However, the specificity does not exhibit a consistent trend with the change of filter number. We notice that the performance of normal sound detections starts to degrade when more filters are used. Since the best accuracy is also obtained considering 20 filters (i.e., same as default value), the performance with optimum number of filters is same as the performance with default configuration as shown in Table 5.

Figure 9: Effect of number of filters in filterbank on lung sound recognition accuracy (%) of respiratory for proposed LBP feature with Bhattacharyya kernel and SVM classifier on Database 1.

Figure 10: Plot of average LBP features computed over all the cycles of the first database. Figure shows the features separately (i.e., without concatenation) corresponding to 18 filters.

Figure 11: Lung sound recognition performance on (a) Database 1 and (b) Database 2 for different number of selected features.

Figure 12: Number of selected features corresponding to each filter for the discrimination of different lung sounds in two databases. The figures are shown for (a) Database 1 and (b) Database 2 where the total number of selected features are , and corresponding to the maximum overall accuracy.

5.3 Feature selection

LBP feature with Bhattacharyya and intersection kernel perform considerably well compared to standard wavelet-based features. However, its dimension is greatly expanded (1044 vs. 27). Figure 10 shows the surface plot of average LBP features which capture the energy (perceptual spectral energy) difference between neighboring frequency bands of the frame. This indicates that some features are visually more discriminative than others. In this section, we reduce the feature dimension by selecting features which are useful for discrimination. Our final objective is to identify the frequency range for effective classification. We have applied a mutual information based feature selection method known as minimum redundancy maximum relevance (mRMR) (peng2005feature, ). Then, analyzing the corresponding filters of the selected features, we find the specific frequency bands (and its temporal and spectral neighborhood) of lung sounds that have distinctive properties for different lung diseases.

Various methods use simple correlation coefficients (Fisher’s discriminant criterion, etc.) or adopt mutual information or statistical tests (guyon2003introduction, ) but these feature selection methods do not consider the dependency among features. Thus, for an efficient feature selection algorithm, selected features should be with minimum redundancy. The specialty of mRMR feature selection method is that it utilizes both the maximum relevance criteria, i.e., selecting the feature subset with the highest relevance to the target class, along with minimum redundancy criteria, i.e., reduction of redundancy among features (peng2005feature, ).

Applying mRMR technique, we have selected features in the two databases, separately and conducted experiments with the selected features. For that, at first, we transform our features from continuous to categorical values (ding2005minimum, ; peng2005feature, ). Then mRMR feature selection is employed. The features are selected on LBP histogram feature, and then the accuracies are measured using intersection kernel. Fig. 11 shows the accuracies for two databases with respect to the number of selected features. From Fig. 11-(a), we observe that only 35 features give maximum overall accuracy for Database 1. It implies that the sounds in the first database is well discriminative and can be recognized with a smaller number of features. In case of the second database, the ILD sounds are characterized well with only ten selected features which indicate that ILD sound has noticeable distinguishable characteristics as shown in Fig. 11-(b). In fact, the crackle sound is a common sign presence in some diseases of ILD group which has higher discrimination ability (sengupta2016lung, ). Here, the best overall accuracy is obtained by selecting 145 features. We also observe that the detection of normal sound needs more number of features in Database 2 possibly because there are different factors that affect normal sounds (reichert2008analysis, ).

Finally, we have investigated the details of the selected features to understand why they are showing more distinguishable properties than the others. For this, we have computed the number of selected features corresponding to each band for MFSC, i.e., the spectral region from which eight neighborhood pixels are used for LBP analysis. Figure 12 shows the counts for each filters. Interestingly, we observe that some particular frequency bands and their neighborhood contribute more in the classification process. From Fig. 12, we observe that even though the entire frequency range is needed for the classification of sounds in Database 1 and Database 2, some particular bands are more prominent than others.

If we look at the selected frequency bands for the first database, coefficient 10 and its neighborhood are most discriminating frequency band in case of the first database, where in case of Database 2, frequency bands 11, 12 and their neighborhoods are more distinguishable in nature. In case of Database 2, where frequency band 3 and its neighborhood play an important role, it has no role in case of Database 1. Same ways, where for Database 1, frequency band 18 and 19 have influences, no such influence is seen for Database 2. If we consider selected bands, all the bands (up 2000 Hz) is required to discriminate the features.

From the F-ratio analysis in (sengupta2016lung, ), we have seen that lower frequency bands of range 200 Hz are contributing to distinguish different lung sounds in case of Database 1. But by using mRMR feature selection, we have observed that this portion of frequency bands are not contributing that much. The reason may be the incapability of the F-ratio analysis to remove the redundancy among features. It can be inferred that features extracted from this portion may be redundant, and thus it does not influence the lung sound detection when mRMR feature selection method is used. Only coefficient 2 and its neighborhood are enough to distinguish different lung sounds. A minimal number of features are selected that are capable of producing the best accuracy using mRMR feature selection method. Also, in F-ratio analysis, we compute the discrimination capability of a particular or individual feature. It is not necessary that individual features that have most discriminative quality individually can perform best when they perform in a group. Here, the chosen frequency bands (and their neighborhood) associated with the selected features are the features that perform best in the group and yield the best accuracy.

6 Conclusions

In this paper, we have studied LBP analysis of time-frequency representation of lung sounds for its classification. LBP histograms are computed for each frequency band in perceptual mel scale and they are normalized and stacked to formulate the feature vector. The proposed feature captures the texture information of lung sounds in the time-frequency domain. We have evaluated the features with kNN, ANN and SVM classifier as back-end on two lung sound databases. Our study with different classifiers and various SVM kernels reveals that LBP-based feature with Bhattacharyya kernel and KNN classifier is superior to other methods, especially for the detection of abnormal sounds. The configuration parameters of proposed features are further optimized experimentally, and their performance is measured. During this process, it is observed that the optimum analysis window length for short-term feature extraction of lung sound is moderately longer than the frequently used window length in speech analysis. Besides, performance is better when the overlap is more. We also have noticed, in most of the cases, detection of normal sound is more crucial, and specificity is quite sensitive to the change of configuration parameters. It is found that detailed information is needed to detect normal sound more accurately. To reduce the computational load, we have used a well-known feature selection technique which reduces feature vector dimension without degradation in the results. Finally, it is seen that while vital information is captured under 1000 Hz, some frequency bands of more than 1000 Hz also influence the results.

The current study is conducted with databases of limited size due to the unavailability of a large amount of data with reliable ground-truths. The newly investigated method should be validated with a larger database. The proposed feature captures relative information of the subbands whereas their absolute values are apparently not in use. Fusion methods can be explored for combining different kinds of information using system fusion techniques.

Conflict of interest statement

The authors declare that they have no conflict of interest.


The work was supported by Ministry of Human Resource Development, Government of India. We are thankful to Dr. Ashok Mondal for his help in the database preparation. We would also like to thank the reviewers for their careful reading of the paper and helpful comments.



  • (1) J. E. Wipf, B. A. Lipsky, J. V. Hirschmann, E. J. Boyko, J. Takasugi, R. L. Peugeot, C. L. Davis, Diagnosing pneumonia by physical examination: relevant or relic?, Archives of Internal Medicine 159 (10) (1999) 1082–1087.
  • (2) G. Gould, A. Redpath, M. Ryan, P. Warren, J. Best, D. Flenley, W. MacNee, Lung CT density correlates with measurements of airflow limitation and the diffusing capacity, European Respiratory Journal 4 (2) (1991) 141–146.
  • (3) N. J. Schulz, Spirometry essentials for medical assistants part iii: standards for the test procedure, Journal of Continuing Education Topics & Issues 9 (1) (2007) 22–22.
  • (4) A. Kandaswamy, C. Kumar, R. Ramanathan, S. Jayaraman, N. Malmurugan, Neural classification of lung sounds using wavelet coefficients, Computers in Biology and Medicine 34 (6) (2004) 523 – 537.
  • (5) B. Lei, S. A. Rahman, I. Song, Content-based classification of breath sound with enhanced features, Neurocomputing 141 (2014) 139–147.
  • (6) N. Sengupta, M. Sahidullah, G. Saha, Lung sound classification using cepstral-based statistical features, Computers in Biology and Medicine 75 (2016) 118–129.
  • (7) A. Bohadana, G. Izbicki, S. Kraman, Fundamentals of lung auscultation, New England Journal of Medicine 370 (8) (2014) 744–751.
  • (8) N. Meslier, G. Charbonneau, J. Racineux, Wheezes, European Respiratory Journal 8 (11) (1995) 1942–1948.
  • (9) J. Zibrak, D. Price, Interstitial lung disease: raising the index of suspicion in primary care, NPJ Primary Care Respiratory Medicine 24 (2014) Article number: 14054.
  • (10) P. Piirila, A. Sovijarvi, Crackles: recording, analysis and clinical significance, European Respiratory Journal 8 (12) (1995) 2139–2148.
  • (11) S. Charleston-Villalobos, G. Martinez-Hernandez, R. Gonzalez-Camarena, G. Chi-Lem, J. G. Carrillo, T. Aljama-Corrales, Assessment of multichannel lung sounds parameterization for two-class classification in interstitial lung disease patients, Computers in Biology and Medicine 41 (7) (2011) 473–482.
  • (12) R. Palaniappan, K. Sundaraj, N. Ahamed, Machine learning in lung sound analysis: a systematic review, Biocybernetics and Biomedical Engineering 33 (3) (2013) 129 – 135.
  • (13) S. Reichert, R. Gass, C. Brandt, E. Andrès, Analysis of respiratory sounds: state of the art, Clinical Medicine: Circulatory, Respiratory and Pulmonary Medicine 2 (2008) 45–58.
  • (14) G. Serbes, C. Sakar, Y. Kahya, N. Aydin, Feature extraction using time-frequency/scale analysis and ensemble of feature sets for crackle detection, in: Proceedings of the 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2011, pp. 3314–3317.
  • (15) S. Pittner, S. Kamarthi, Feature extraction from wavelet coefficients for pattern recognition tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1) (1999) 83–88.
  • (16) X. Lu, M. Bahoura, An integrated automated system for crackles extraction and classification, Biomedical Signal Processing and Control 3 (3) (2008) 244–254.
  • (17) B.-S. Lin, B.-S. Lin, H.-D. Wu, F.-C. Chong, S.-J. Chen, Wheeze recognition based on 2D bilateral filtering of spectrogram, Biomedical Engineering: Applications, Basis and Communications 18 (03) (2006) 128–137.
  • (18) A. Abbas, A. Fahim, An automated computerized auscultation and diagnostic system for pulmonary diseases, Journal of Medical Systems 34 (6) (2010) 1149–1155.
  • (19) R. Riella, P. Nohama, J. Maia, Method for automatic detection of wheezing in lung sounds, Brazilian Journal of Medical and Biological Research 42 (7) (2009) 674–684.
  • (20) S. Xie, F. Jin, S. Krishnan, F. Sattar, Signal feature extraction by multi-scale PCA and its application to respiratory sound classification, Medical & Biological Engineering & Computing 50 (7) (2012) 759–768.
  • (21) F. Jin, S. Krishnan, F. Sattar, Adventitious sounds identification and extraction using temporal–spectral dominance-based features, IEEE Transactions on Biomedical Engineering 58 (11) (2011) 3078–3087.
  • (22) S. Rietveld, M. Oud, E. Dooijes, Classification of asthmatic breath sounds: preliminary results of the classifying capacity of human examiners versus artificial neural networks, Computers and Biomedical Research 32 (5) (1999) 440–448.
  • (23) L. Waitman, K. Clarkson, J. Barwise, P. King, Representation and classification of breath sounds recorded in an intensive care setting using neural networks, Journal of Clinical Monitoring and Computing 16 (2) (2000) 95–105.
  • (24) M. Munakata, H. Ukita, I. Doi, Y. Ohtsuka, Y. Masaki, Y. Homma, Y. Kawakami, Spectral and waveform characteristics of fine and coarse crackles., Thorax 46 (9) (1991) 651–657.
  • (25) B. Sankur, Y. Kahya, E. Güler, T. Engin, Comparison of AR-based algorithms for respiratory sounds classification, Computers in Biology and Medicine 24 (1) (1994) 67–76.
  • (26) S. Alsmadi, Y. Kahya, Design of a DSP-based instrument for real-time classification of pulmonary sounds, Computers in Biology and Medicine 38 (1) (2008) 53–61.
  • (27) G.-C. Chang, Y.-F. Lai, Performance evaluation and enhancement of lung sound recognition system in two real noisy environments, Computer Methods and Programs in Biomedicine 97 (2) (2010) 141 – 150.
  • (28) H. Martinez-Hernandez, C. Aljama-Corrales, R. Gonzalez-Camarena, V. Charleston-Villalobos, G. Chi-Lem, Computerized classification of normal and abnormal lung sounds by multivariate linear autoregressive model, in: Proceedings of 27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE, 2006, pp. 5999–6002.
  • (29) A. Cohen, D. Landsberg, Analysis and automatic classification of breath sounds, IEEE Transactions on Biomedical Engineering BME-31 (9) (1984) 585–590.
  • (30) M. Sahidullah, G. Saha, Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication 54 (4) (2012) 543–565.
  • (31) M. Bahoura, Pattern recognition methods applied to respiratory sounds classification into normal and wheeze classes, Computers in Biology and Medicine 39 (9) (2009) 824 – 843.
  • (32) A. D. Orjuela-Cañón, D. Gómez-Cajas, R. Jiménez-Moreno, Artificial neural networks for acoustic lung signals classification, in: Proceedings of Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Springer, 2014, pp. 214–221.
  • (33) J.-C. Chien, H.-D. Wu, F.-C. Chong, C.-I. Li, Wheeze detection using cepstral analysis in Gaussian mixture models, in: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2007, pp. 3168–3171.
  • (34) M. Bahoura, C. Pelletier, Respiratory sounds classification using cepstral analysis and Gaussian mixture models, in: Proceedings of the 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 1, 2004, pp. 9–12.
  • (35) P. Mayorga, C. Druzgalski, R. Morelos, O. Gonzalez, J. Vidales, Acoustics based assessment of respiratory diseases using GMM classification, in: Proceedings of the 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2010, pp. 6312–6316.
  • (36) N. Sengupta, M. Sahidullah, G. Saha, Optimization of cepstral features for robust lung sound classification, in: India Conference (INDICON), 2015 Annual IEEE, IEEE, 2015, pp. 1–6.
  • (37) R. Palaniappan, K. Sundaraj, S. Sundaraj, N. Huliraj, S. Revadi, B. Archana, Pulmonary acoustic signal classification using autoregressive coefficients and k-nearest neighbor, in: Applied Mechanics and Materials, Vol. 591, Trans Tech Publ, 2014, pp. 211–214.
  • (38) M. Tuceryan, A. K. Jain, Texture analysis, in: C. Chen, L. Pau, P. Wang (Eds.), Handbook of Pattern Recognition and Computer Vision, World Scientific, Singapore, 1993, Ch. 2.1, pp. 235–276.
  • (39) R. T. H. Laennec, J. Forbes, A Treatise on the Diseases of the Chest, and on Mediate Auscultation, Samuel Wood & Sons, 1838.
  • (40) L. Hadjileontiadis, A texture-based classification of crackles and squawks using lacunarity, IEEE Transactions on Biomedical Engineering 56 (3) (2009) 718–732.
  • (41) P. Bhattacharyya, A. Mondal, R. Dey, D. Saha, G. Saha, Novel algorithm to identify and differentiate specific digital signature of breath sound in patients with diffuse parenchymal lung disease, Respirology 20 (4) (2015) 633–639.
  • (42) F. Jin, F. Sattar, D. Y. Goh, New approaches for spectro-temporal feature extraction with applications to respiratory sound classification, Neurocomputing 123 (2014) 362–371.
  • (43) T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on pattern analysis and machine intelligence 24 (7) (2002) 971–987.
  • (44) D. Huang, C. Shan, M. Ardabilian, Y. Wang, L. Chen, Local binary patterns and its application to facial image analysis: a survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41 (6) (2011) 765–781.
  • (45) M. Pietikäinen, A. Hadid, G. Zhao, T. Ahonen, Local binary patterns for still images, in: Computer vision using local binary patterns, Springer, 2011, pp. 13–47.
  • (46) C. Shan, S. Gong, P. W. McOwan, Facial expression recognition based on local binary patterns: a comprehensive study, Image and Vision Computing 27 (6) (2009) 803–816.
  • (47) C. Shan, S. Gong, P. W. McOwan, Robust facial expression recognition using local binary patterns, in: IEEE International Conference on Image Processing 2005, Vol. 2, IEEE, 2005, pp. II–370.
  • (48) S. Liao, M. Law, A. Chung, Dominant local binary patterns for texture classification, IEEE Transactions on Image Processing 18 (5) (2009) 1107–1118.
  • (49) M. Heikkilä, M. Pietikäinen, C. Schmid, Description of interest regions with local binary patterns, Pattern Recognition 42 (3) (2009) 425–436.
  • (50) F. Alegre, R. Vipperla, A. Amehraye, N. Evans, A new speaker verification spoofing countermeasure based on local binary patterns, in: Proceedings of the INTERSPEECH, 2013, pp. 940–944.
  • (51) Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, H. Li, Spoofing and countermeasures for speaker verification: a survey, Speech Communication 66 (2015) 130–153.
  • (52) A. Roy, M. M. Doss, S. Marcel, A fast parts-based approach to speaker verification using boosted slice classifiers, IEEE Transactions on Information Forensics and Security 7 (1) (2012) 241–254.
  • (53) O. Chapelle, P. Haffner, V. N. Vapnik, Support vector machines for histogram-based image classification, IEEE Transactions on Neural Networks 10 (5) (1999) 1055–1064.
  • (54) W.-T. Wong, S.-H. Hsu, Application of svm and ann for image retrieval, European Journal of Operational Research 173 (3) (2006) 938–950.
  • (55) R. Kondor, T. Jebara, A kernel between sets of vectors, in: ICML, Vol. 20, 2003, p. 361.
  • (56) S. Maji, A. Berg, J. Malik, Efficient classification for additive kernel svms, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 66–77.
  • (57) L. v. d. Maaten, G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research 9 (Nov) (2008) 2579–2605.
  • (58) S. Haykin, Neural Networks and Learning Machines, 3rd Edition, Pearson, 2008.
  • (59) S. Ari, K. Hembram, G. Saha, Detection of cardiac abnormality from PCG signal using LMS based least square SVM classifier, Expert Systems with Applications 37 (12) (2010) 8019–8026.
  • (60) A. J. Smola, B. Schölkopf, A tutorial on support vector regression, Statistics and Computing 14 (3) (2004) 199–222.
  • (61) S. Ghorai, A. Mukherjee, P. Dutta, Nonparallel plane proximal classifier, Signal Processing 89 (4) (2009) 510–522.
  • (62) K. A. Lee, C. H. You, H. Li, T. Kinnunen, K. C. Sim, Using discrete probabilities with bhattacharyya measure for SVM-based speaker verification, IEEE Transactions on Audio, Speech, and Language Processing 19 (4) (2011) 861–870.
  • (63) S. Maji, A. C. Berg, J. Malik, Classification using intersection kernel support vector machines is efficient, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8.
  • (64) A. Mondal, P. Bhattacharya, G. Saha, Reduction of heart sound interference from lung sound signals using empirical mode decomposition technique, Journal of Medical Engineering & Technology 35 (6-7) (2011) 344–353.
  • (65) A. Mondal, P. Bhattacharya, G. Saha, Detection of lungs status using morphological complexities of respiratory sounds, The Scientific World Journal 2014.
  • (66) M. Lozano, J. A. Fiz, R. Jané, Automatic differentiation of normal and continuous adventitious respiratory sounds using ensemble empirical mode decomposition and instantaneous frequency, IEEE journal of biomedical and health informatics 20 (2) (2016) 486–497.
  • (67) S. Ulukaya, G. Serbes, I. Sen, Y. P. Kahya, A lung sound classification system based on the rational dilation wavelet transform, in: Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the, IEEE, 2016, pp. 3745–3748.
  • (68) H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) (2005) 1226–1238.
  • (69) I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (Mar) (2003) 1157–1182.
  • (70) C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data, Journal of bioinformatics and computational biology 3 (02) (2005) 185–205.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description