Flower Species Grouping to Find Out Outliers Using DBSCAN Clustering on Google Colab
Abstract
This study aims to identify iris flower species and detect outlier data using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) grouping method on the Google Colab platform. The data used were iris datasets from the UCI Machine Learning Repository, which consisted of three species: Setosa, Versicolor, and Virginica, with attributes such as sepal length and width and petals. In this study, the DBSCAN process includes the pre-processing stage of data, parameter determination, model building, and visualization of clustering results. DBSCAN was chosen because it is able to detect outliers and does not require a predetermined number of clusters, making it effective for irregular data. The results showed that DBSCAN managed to group the data into three main clusters, with clear identification of outliers. Cluster 0 includes all Setosa data, while cluster 1 consists of Versicolor and Virginica data. The -1 cluster, which contains data that is considered an outlier, suggests that some specimens have unusual characteristics. In conclusion, the DBSCAN method is effective in grouping iris flower data based on density and detecting different data points.
Copyright (c) 2025 Adzkia Nur Nasution, Ardilla Syafitri Lubis, Keysa Shifa Adwitia Sitepu, Rezkya Nadilla Putri, Arnita Arnita

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.