Tech ### 4 -Machine learning, code for identifying the outliers in jupyter notebook

We first load the dataset into a pandas dataframe using the `pd.read_csv()` function.

We then define a function called `detect_outliers` that takes in a dataset and uses the z-score method to detect outliers. The function first calculates the mean and standard deviation of the data, and then sets a threshold for detecting outliers as three times the standard deviation.

``````
import pandas as pd
import numpy as np

# define a function to detect outliers
def detect_outliers(data):
# calculate the mean and standard deviation of the data
mean = np.mean(data)
std = np.std(data)
# set the threshold for detecting outliers
threshold = 3 * std
# identify outliers using the z-score method
z_scores = [(x - mean) / std for x in data]
outliers = np.where(np.abs(z_scores) > threshold)
return outliers

# apply the detect_outliers function to each column of the dataset
outliers = {}
for column in df.columns:
column_outliers = detect_outliers(df[column])
outliers[column] = column_outliers

# print the outliers for each column
for column, column_outliers in outliers.items():
print('Outliers in column {}: {}'.format(column, column_outliers))
``````

The function then calculates the z-scores for each data point in the dataset, and identifies outliers as any data point with an absolute z-score greater than the threshold.

Next, we apply the `detect_outliers` function to each column of the dataset using a `for` loop, and store the outliers for each column in a dictionary called `outliers`.

Finally, we print the outliers for each column by iterating through the `outliers` dictionary using another `for` loop.