Learning Model Building in Scikit-learn : A Python Machine Learning Library

- June 25, 2023

Scikit-learn is a free and open-source library for Python that provides tools for various aspects of machine learning. It offers a wide range of functions and algorithms that help with tasks like analyzing and processing data, creating and training machine learning models, evaluating model performance, and visualizing results.

Important features of scikit-learn:

1. it can be freely used, modified, and distributed, even for commercial purposes.

2. This makes scikit-learn a flexible and cost-effective choice for data analysis and machine learning

tasks.

3.Scikit-learn is built on top of well-established and widely used Python libraries such as NumPy,

SciPy, and matplotlib.

Now we will see how we can easily build a machine learning model using scikit-learn.

Prior to proceeding with scikit-learn installation, verify that you have NumPy and SciPy set up. This ensures that all the required dependencies are in place for a smooth installation and seamless usage of scikit-learn's machine learning capabilities.

pip install -U scikit-learn

Loading exemplar dataset: scikit-learn comes loaded with a few example datasets like the iris datasets for classification

Load a dataset

To work with a dataset in scikit-learn, the first step is to load the data into your Python environment. This can be done using various methods, such as reading data from a file, accessing an online dataset, or generating synthetic data. Once the dataset is loaded, you will have the feature matrix 'X' and the response vector 'y' available for further analysis and modeling.

Step - by - Step Explaination:

1. Import the necessary libraries and modules.

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import accuracy_score

We import the required libraries for loading the dataset, splitting the data into training(80%) and testing sets(20%), creating the MLP classifier model, and calculating the accuracy of the model.

2. Load the iris dataset.

iris = load_iris()

X = iris.data

y = iris.target

We load the iris dataset, which is a popular dataset for classification tasks. 'X' represents the feature matrix (input data) and 'y' represents the target vector (output labels).

3. Split the dataset into training and testing sets.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

Using the 'train_test_split' function, we split the dataset into training and testing sets. Here, 80% of the data is allocated for testing, and the 'random_state' parameter ensures reproducibility of the split.

4. Create the MLP classifier model.

model = MLPClassifier(hidden_layer_sizes=(64, 32), max_iter=1000, random_state=1)

We create an instance of the MLPClassifier, which represents a multi-layer perceptron neural network. The 'hidden_layer_sizes' parameter specifies the number of neurons in each hidden layer. In this case, we have two hidden layers with 64 and 32 neurons respectively. The 'max_iter' parameter determines the maximum number of training iterations.

5. Train the model.

model.fit(X_train, y_train)

We train the MLP classifier model using the training data. The model learns to map the input features 'X_train' to the corresponding target labels 'y_train'.

6. Make predictions on the testing set.

y_pred = model.predict(X_test)

Using the trained model, we make predictions on the testing set ('X_test'). The model predicts the target labels for the input features.

7. Calculate the accuracy of the model.

accuracy = accuracy_score(y_test, y_pred)

print("Model accuracy:", accuracy)

We calculate the accuracy of the model by comparing the predicted labels ('y_pred') with the true labels (y_test). The 'accuracy_score' function computes the accuracy by calculating the ratio of correct predictions to the total number of samples in the testing set.

Above is the default architecture of the MLPClassifier in scikit-learn, which consists of a single hidden layer with 100 neurons.

Additional:

If you would like to include additional layers or customize the architecture of the neural network, you can modify the 'hidden_layer_sizes' parameter when creating the MLPClassifier. For example, if you want to include two hidden layers with 64 and 32 neurons respectively, you can modify the code as follows-

model = MLPClassifier(hidden_layer_sizes=(64, 32), max_iter=1000, random_state=1)

This will create an MLP classifier model with two hidden layers, the first layer consisting of 64 neurons and the second layer consisting of 32 neurons.

Search This Blog

Musing with Mukesh: Unraveling the Wonders of Computer Science