Machine Learning — Iris dataset for K-Nearest Neighbors(KNN) Algorithm

3 min readFeb 16, 2021

Introduction

The K-NN(K-Nearest Neighbor) algorithm is one of simplest yet most used classification algorithm. K-NN is a non-parametric and lazy learning algorithm. It does not learn training data, but instead “memorizes” the training data set. When we want to make a guess, it looks for the closest neighbors in the entire data set.

In the calculation of the algorithm the K value is determined. The meaning of this K value is the number of elements to display. When the value comes, the distance between the value is calculated by taking the nearest K number of elements. The Euclidean function is often used in distance calculation. Manhattan function, Minkowski function and the Hamming function can also be used as the alternatives for Euclidean function. After calculating the distance, classify it and assign the relevant value to the appropriate category.

Iris Dataset

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by the British statistician, eugenicist, and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).

At the same time, each sample has four properties. These;

• Measuring the length of the sepals in cm
• Measuring the width of the sepals in cm
• Measuring the length of the petals in cm
• Measuring the width of the sepals in cm

Iris flower data set, clustered using k means (left) and true species in the data set (right). Note that k-means is non-determinicstic, so results vary. Cluster means are visualized using larger, semi-transparent markers. The visualization was generated using ELKI

Implementation

We used Euclidean distance as the distance and similarity measure for this sample.

Calculation of Succes:

Confusion Matrix:

TP(True Positive) and TN(True Negative) values return the correct number of values

FP(False Positive) and FN(False Negative) values return the wrong number of values

The ratio of the correct value number to all value numbers indicates the correct estimate ratio.

The ratio of the incorrect value number to all value numbers indicates the incorrect estimate ratio.

Result

The data set showed the best K

For 3 = %95,2380952380952

For 5 = %94,2857142857143

For 10 = %91,4285714285714

And it continued to decrease.

Veysel Guzelsoz.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Veysel Guzelsoz

10 Followers

24 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Veysel Guzelsoz

Veysel Guzelsoz

Why Square Root 2 is Irrational

A solution to a 2300 year old proven problem with new perspectives

Feb 16, 2021

Veysel Guzelsoz

Why Square Root 2 is Irrational

A solution to a 2300 year old proven problem with new perspectives

Feb 16, 2021

See all from Veysel Guzelsoz

Recommended from Medium

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

D.H. Jang

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Nov 3, 2024

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

1.4K

Lists

Predictive Modeling w/ Python

20 stories1856 saves

Practical Guides to Machine Learning

10 stories2225 saves

Coding & Development

11 stories1033 saves

Natural Language Processing

1977 stories1619 saves

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

10.6K

260

How Does Our Sense of Humor Change With Age? A Statistical Analysis

Fanfare

Daniel Parris

How Does Our Sense of Humor Change With Age? A Statistical Analysis

How do our comedic sensibilities form and transform over time?

Jun 22, 2024

343

Understanding Decision Boundaries in Machine Learning

Okeshakarunarathne

Understanding Decision Boundaries in Machine Learning

When training a machine learning model for classification tasks, one of the most important concepts to grasp is the decision boundary. It…

Sep 27, 2024

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

731

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams