A Comparative approach to cluster maternal data using k-means and k-medoid


Department of Computer Science, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Mysuru, Karnataka, +91 9961451866, India

Abstract

In our work, we cluster the data collected on pregnancy women through hospitals, into three different categories such as elective LSCS, emergency LSCS and normal delivery. We aim to identify outliers so that we can predict the complications that can occur during pregnancy. These complications can even lead to maternal death. Clustering is performed by the aid of k-means, and k-medoid procedures which are implemented in windows form application to run in Microsoft Visual Studio, a comparison is provided on its performance and efficiency between both the algorithms and also the outlier, i.e. normal delivery cases are detected and depicted. From our work, we can clearly say that K-Means works better when the dataset is between the range of 50 – 100. Suppose the data set is greater than 500 then K-medoid works efficiently. In our work, we are considering various maternity cases. Here we are detecting normal delivery cases as an outlier.

Keywords

K-means, K-medoid, Elective LSCS, Emergency LSCS

Introduction

The data clustering is a tedious task in different areas, such as pattern matching and bio-medical data. Clustering is a technique in data mining which accumulates the data based on its similarity. A cluster imbibes similar kind of data items. The main reason is to organize the data item into clusters, such that each cluster imbibes similar kind of data items within. In our proposed work, we took advantage of k-medoid and k-means clustering algorithm for clustering of various types of pregnancy cases based on the maternity data.

Clustering is a method of making a group of abstract objects to eloquent subclasses. In clustering, there are six types of methods. One of the methods is partitioning method K-means, and K-medoid comes under it. K-means is a simple unsupervised learning algorithm. K-means algorithm will classify the data items into various groups. The data items are grouped by reducing the figure of squares of spaces between the centroid of the cluster and the data. K-means attempts to minimize the total squared errors.

K-medoid is a clustering method for the classical partitioning of data items, clustering will be done using the information set of n items and k clusters. The value will be defined before the algorithm executes. It will minimize the sum of dissimilitude between points within the cluster and the centroid.

A Caesarean operation, also called Caesarean, is a medical approach for deliver the baby by making the incision on the abdomen and uterus of the pregnant women.

Elective LSCS

If a doctor is planning for Caesarean, he/she is responsible to you to give all the required information regarding Caesarean to make you choose a decision. If one take a decision for Caesarean during the initial stage, it will be done only after 38 weeks of the pregnancy.

Reasons behind planned Caesarean include

  • Sometimes the natural birth (vaginally) may not happen because the placenta is over the cervix (placenta praevia).

  • If the body of the baby was immense or not in the correct position, in such cases, we may need a Caesarean.

  • We force to do Caesarean if the cervix did not open properly.

Emergency LSCS

During the situations like if the baby is not safe to wait till the vaginal birth, sometimes due to the concern about your health or your baby's health we may force to do the Caesarean as an emergency.

Literature review

(Batra, 2011) has analysed both K-Means and K-Medoid algorithms on the basis of their approach. They generated input data items through normal and uniform distribution and measured the performance of both algorithms. (Velmurugan & Dharmarajan, 2014) Have done a novel research study on k-means and k-medoid algorithms through analysing the quality of clustering of both algorithms. Initially, they implemented the algorithms using JAVA language, and they gave the randomly distributed data items as input and calculated the performance of each algorithm. They represented the experimental results in tabular as well as graphical format. (Arora, Deepali, & Varshney, 2016) Here they have used Keel tool for analysis, and they have taken 10 thousand random numbers and then generated the points and clustered them. By comparing the execution time, space complexity, noise reduction and the level of dissimilarity, they identified that k-medoid was much better than k-means. (Soni & Patel, 2017) Had used the UCI Machine Learning Repository as input data and compared both algorithms. Thy considered the execution time, the sensitivity of out layer data, scalability and noise reduction of both k-means and k-medoid algorithms during the time of execution and reached a result that k-medoid is more efficient than k-means. (Reynolds, Richards, & Rayward-Smith, 2004) Have done a novel study on two clustering methods, and their research work leads to the generation of an all rules algorithm that combines both PAM and k-medoid algorithm. While comparing with the k-medoid, they observed that PAM results in an effective result. (Santhanam, 2016) I have used three datasets for analysis. In this paper, they have shown that k-Means is much more efficient than k-Medoid while comparing the distance between the objects they observed that k-medoid is consuming more time than k-means. Most" "time-consuming part of the k-Medoid algorithm, when the calculation of the distance between the object. (Dharmarajan & Velmurugan, 2016) This work is done in java, and their computational complexity is also analysed. The advantages of k-means computational cost are low and high computational cost in k-Medoid. Several researchers have proven that the k-Means algorithm is suitable to compare to other clustering algorithm in the medical data set. (Sufcw, 2001) The paper proposes an idea to profitably modify the k-means clustering algorithm. Improvements in clustering accuracy were observed when artificial constraints were applied on six data sets. The method is implemented in real-time GPS applications for detecting the road lanes from satellite data. (Pattabiraman, Parvathi, Nedunchezian, & Palaniammal, 2009) Have proposed a new spatial clustering method for edge detection. This paper discusses the obstacles related to spatial clustering in detail. They took advantage of the k-medoid method for clustering the spatial data. They considered execution time and the number of clusters from various clustering algorithms for the comparison. (Shah & Singh, 2012) Proposed a modified k-means algorithm. The modified k-means, k-means and k-medoid are evaluated using real-time data. The paper concludes that the proposed modified k-means is better as it takes a small amount of time for the execution, better in terms of a number of clusters when compared with k-means and k-medoid algorithms. (Nair, Arjun, & Kundapur, 2015) The paper proposes the identification of dense regions of a particular city or a geographical area with the help of census data. DB scan clustering algorithm was used to cluster the data. This paper shows the experience of a data mining approach using census data. (Nair, Arjun, & Reghunath, 2018) Have done a novel research study to find the protein functional region in human embryo cell by using the ANN classifier. They had drawn a box-plot diagram for showing the experimental result and evaluated the performance of the classifier on the basis of the comparison energy level of the coding region.

Proposed work

In our proposed work, we are collecting pregnant women's data manually from Hospital after that automating the data and converting into excel format. The collected data is made in tabular format, as shown in Table 1.

Table 1: Dataset used for clustering

Name

Age

Indication

Surgery Mode

Baby Weight

Tejaswini

25

Fetal Distrus

Emergency LSCS

2.5

Ranjitha

19

Ecephdic Presentation in Labour

Emergency LSCS

2.5

Apoorva

27

Fetal Distress in Labour

Emergency LSCS

3.9

Ashwini

25

MSAF in labour

Emergency LSCS

2.7

Tanuja

25

Fetal Distress

Emergency LSCS

2.65

Chnnajamma

31

Post JVI

Emergency LSCS

2.6

Latha

25

Fetal Distress

Emergency LSCS

2.75

Sheethal

28

Failed Induction

Emergency LSCS

3.25

Thanvi

23

None

Normal Delivery

2.5

Anitha

18

Fetal Distress

Emergency LSCS

2.75

Ramya

28

Prev Cd in Labour

Emergency LSCS

2.65

Anitha

19

Complete Bleach in II stape

breech vaginal delivery

2.5

shushma

21

Prev Cd in Labour

Emergency LSCS

3

shruthi

22

MSAF in labour

Emergency LSCS

3.4

sheela

31

cpd in labour

Emergency LSCS

3

radha

20

petve in labour

Emergency LSCS

2.4

Meena

29

Prev Cd in Labour

Emergency LSCS

2.7

Aswini

32

MSAF in labour

Emergency LSCS

2.8

ramya

21

fetal distress

Emergency LSCS

2.5

jyothi

22

severe dehydration in labour

Emergency LSCS

2.5

Anjali

22

MSAF in labour

Emergency LSCS

2

Bharathi

22

Prev Cd in Labour

Emergency LSCS

2.7

Saraswathi

23

G3,P1,L1 in labour

Emergency LSCS

2.25

Varalakshmi

24

fetal distress

Emergency LSCS

2.4

Latha

26

fetal distress

Emergency LSCS

3

sheela

25

Prime labour

Emergency LSCS

2.5

Anjali k

21

Oligohydramnios

Emergency LSCS

3

Vibha

23

Prime cpd in labour

Emergency LSCS

2.5

Ningamma

30

Failed Induction

Emergency LSCS

3.25

Chaitra

26

G2,a1 growth incuction

Emergency LSCS

3.3

Vinutha

26

ovelation induction

Emergency LSCS

3.3

Geetha

21

Oligohydramnios

Emergency LSCS

2.5

Redha

21

fetal distress

Emergency LSCS

2.8

Rashmita

18

cpd in labour

Emergency LSCS

3.75

Kavana

28

Oligohydramnios

Emergency LSCS

3.74

Dakshayini

24

polyhydranion in labour

Emergency LSCS

3.7

Swetha

24

fetal distress

Emergency LSCS

2.7

Asha

23

Cure gworth

Normal Delivery

3

Manjula

34

fetal distress

Emergency LSCS

3.4

Rohini

22

preved

Emergency LSCS

3

Indira

23

preved

Emergency LSCS

2.8

Nalani

24

preved

Emergency LSCS

2.7

Jothi

23

MCDA GDM in labour

Emergency LSCS

1.8

Soubhagya

19

Oligohydramnios

Emergency LSCS

3

Manasa

26

Oligohydramnios

Emergency LSCS

3

Vinisha

19

phomsh in labour

Emergency LSCS

2.7

Manjua

25

cpd in labour

Emergency LSCS

3.7

Bhavya

29

prime at breech

Emergency LSCS

3

Poornima

28

cpd in labour

Emergency LSCS

3

Nandini

25

pracd in labour

Emergency LSCS

3.4

Applying two clustering algorithms like K-Means and K-Medoid. To group the data for various categories such as Elective LSCS, Emergency LSCS and Normal Delivery. Compare the efficiency between two clustering algorithms based on the partitioning techniques. Based on the algorithms, we are clearly detecting the outliers, such as normal delivery. The dataflow diagram for identifying outlier is given in Figure 4.

Materials and Methods

Step 1-collecting and preparing the pregnancy data.

Step 2-Apply K-means algorithm.

Step 3-Apply K-medoid algorithm.

Step 4-detect outlier in both cases.

Step 5-compare the execution time.

K-means

INPUT

A = set of e data points

K = number of clusters

I = Iterations desired

OUTPUT

C = set of c cluster centroids

L = set of distances, l from e to the assigned centroid.

For c in C

Allocate centroid c to remain at some a.

For a in A

Calculate distance from a to all centroids c.

Assign each e to centroid c with min. distance. Store in L.

i = 0.

minDistance = Inf

While i < I:

For c in C

Calculate the average position of all an allocated to cluster c.

Reallocate centroid c to different location.

For a in A

Compute distance from a to all centroids c.

If minDistance!= l:

Allocate each a to centroid c with min. distance = L.

Else

End

Return assignments.

Distance formula for Euclidean and the term explanation

E D = ( a 2 - a 1 ) 2 + ( b 2 - b 1 ) 2

ED is the distance and (a1, b1) and (a2, b2) are the points

The algorithm receipts the input bound L, the amount of clusters to be separated between a set of Z objects. A Usual K-Medoid algorithm for separating built on Medoid or central items.

Input

L: The number of clusters

T: A data set containing Z objects

Output: a set of L clusters that minimizes the sum of the dissimilarities of all the objects to their nearest medoid.

Method: Arbitrarily choose Z objects in T as the initial representative objects.

Repeat: Assign each remaining object to the cluster with the nearest medoid; randomly select a non-medoid object S random; calculate the whole points M of exchange point Sj with S random.

If M < 0 then exchange Sj with S random to form the new set of k medoid Till no variation.

Results and Discussion

The pregnancy data is taken based on the age, indication, surgery mode and the baby weight.

Using k means algorithm, we got two clusters like cluster1 and cluster 2 using various kind of deliveries. Here we are plotting the x-axis value as no of clusters and y-axis as baby weight. In this clustering approach, we found that normal and breech vaginal delivery case as an outlier. The implementation results of K-means are given in Figure 1.

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/1228b897-4dd7-4394-a943-5774a37a4db7/image/202da358-503b-4ea9-b920-ef0564571121-upicture1.png
Figure 1: Clusters obtained for K-means

In K-medoid we are using the same data set like various type of pregnancy, and here we are getting three clusters based on baby weight and surgery mode, here outlier detection is very clear as normal delivery and breech vaginal delivery. We can see the clusters formed by implementing K-medoid algorithm in Figure 2.

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/1228b897-4dd7-4394-a943-5774a37a4db7/image/88478598-bcf2-4ec2-8c19-990754919adc-upicture2.png
Figure 2: Clusters obtained for K-medoid

When we compare k-mean and k-medoid in terms of execution time, K-medoid is good for a 100 data set as well as outlier detection is very good and clear in K-medoid compare to K-means. The Comparative graph of K-means and K-medoid gives in Figure 3.

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/1228b897-4dd7-4394-a943-5774a37a4db7/image/8097a1c8-7946-4d64-bad2-9684d672190f-upicture3.png
Figure 3: Comparison graph of K-means and K-medoid

https://typeset-prod-media-server.s3.amazonaws.com/article_uploads/1228b897-4dd7-4394-a943-5774a37a4db7/image/097419be-e5fd-4a4d-9398-d2b2965aa1de-upicture4.png
Figure 4: Data flow diagram for detecting outliers

Conclusion

From our work, we can clearly say that K-Means works better when the dataset is between the range 50 – 100. Suppose the data set is greater than 500 then K-medoid works efficiently. In our work, we are considering various maternity cases. Here we are detecting normal delivery cases as an outlier.

In future studies they can increase the dataset size accordingly partitioning algorithm will change in terms of efficiency, also instead of two groups, we can considering multiple dataset complications in maternity for clustering to increase the performance of the clustering algorithms.