Association Analysis

The dataset is having many binary features, each representing the presence of some extra equipment in the car. This makes the dataset suitable to run association analysis on the equipment fields.

By definition, Association analysis (or Market Basket Analysis) is mainly a data mining process that helps identify co-occurrence of certain events/activities performed by a user group.

In our case we will use the results to see which pairs of the equipment features are found together most often. There are 3 main concepts that help us measure the strength of an association rule. They are as follows:

Support :
- $ supp(X) = {\text{# of listings in which }X \text{ appears} \over \text{Total # of listings}}$
- Support of an itemset $ X $ is defined as a proportion of transactions in the database that contain $ X $
Confidence:
- $conf(X \to Y) = {supp(X \cup Y)\over supp(X)}$
- Confidence measures the probability of itemset $ Y $ occuring with itemset $ X $.
Lift:
- $lift(X \to Y) = {supp(X \cup Y)\over supp(X) \times supp(Y)}$
- Lift measures the ratio of the observed support to that expected if $ X $ and $ Y $ were independent.
  - If $ lift(X \to Y) = 1 $, then it would imply that probabilities of occurrences of itemset X and itemset Y are independent of each other, meaning that the rule doesn’t show any statistically proven relationship.
  - If $ lift(X \to Y) > 1 $, that lets us know the degree to which those two occurrences are dependent on one another
  - If $ lift(X \to Y) < 1 $, that lets us know the items are substitute to each other

We are sorting the association table by the lift measure, as it is the most complex one and most usefull in our dataset.

import numpy as np
import pandas as pd
from pymongo import MongoClient 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
tbr = ['1','10 months','11 months','112 months','12 months','13 months','14 months','15 months','16 months','17 months','18 months',
       '19 months','2 months','20 months','21 months','22 months','23 months','24 months','25 months','26 months','27 months',
       '28 months','29 months','3 months','30 months','31 months','32 months','33 months','34 months','35 months','36 months',
 '38 months','4 months','40 months','41 months','42 months','43 months','44 months','45 months','46 months','47 months',
 '48 months','5 months','50 months','52 months','53 months','54 months','55 months','56 months','58 months','59 months',
 '6 months','60 months','7 months','72 months','8 months','84 months','88 months','9 months', '0 months','1 months']


def readData():

    client = MongoClient('mongodb+srv://<User>:<Pass>@dwprojectcluster.lpqbf.mongodb.net/cars_database?retryWrites=true&w=majority')

    df_cars = pd.DataFrame(list(client.cars_database.cars.find({})))
    df_cars.drop('_id', axis = 1, inplace = True)
    df_cars = df_cars[df_cars['Loaded_in_DW'].eq(False)]


    return df_cars


df_cars = readData()

equipment = df_cars.iloc[:,15:]
equipment = equipment.replace({np.nan: False})
equipment = equipment.replace({1: True})
equipment = equipment.replace({'1': True})
equipment = equipment.replace(tbr , True)

ap = apriori(equipment, min_support=0.7, use_colnames=True)
rules_ap = association_rules(ap, metric="lift", min_threshold=0)
rules_ap.sort_values(by = 'lift', ascending = False)[0:20]

	antecedents	consequents	antecedent support	consequent support	support	confidence	lift	leverage	conviction
94	(Side airbag)	(Passenger-side airbag, ABS)	0.783843	0.786129	0.701440	0.894873	1.138328	0.085238	2.034404
91	(Passenger-side airbag, ABS)	(Side airbag)	0.786129	0.783843	0.701440	0.892271	1.138328	0.085238	2.006483
93	(Passenger-side airbag)	(Side airbag, ABS)	0.821904	0.752470	0.701440	0.853433	1.134175	0.082982	1.688848
92	(Side airbag, ABS)	(Passenger-side airbag)	0.752470	0.821904	0.701440	0.932183	1.134175	0.082982	2.626135
41	(Side airbag)	(Passenger-side airbag)	0.783843	0.821904	0.723756	0.923343	1.123419	0.079512	2.323283
40	(Passenger-side airbag)	(Side airbag)	0.821904	0.783843	0.723756	0.880584	1.123419	0.079512	1.810123
86	(ABS, Power windows)	(Side airbag)	0.800806	0.783843	0.701644	0.876171	1.117789	0.073937	1.745611
87	(Side airbag)	(ABS, Power windows)	0.783843	0.800806	0.701644	0.895133	1.117789	0.073937	1.899482
84	(Side airbag, ABS)	(Power windows)	0.752470	0.841320	0.701644	0.932454	1.108323	0.068576	2.349220
89	(Power windows)	(Side airbag, ABS)	0.841320	0.752470	0.701644	0.833980	1.108323	0.068576	1.490963
63	(Side airbag)	(ABS, Power steering)	0.783843	0.819335	0.707683	0.902837	1.101914	0.065452	1.859405
62	(ABS, Power steering)	(Side airbag)	0.819335	0.783843	0.707683	0.863728	1.101914	0.065452	1.586215
80	(ABS, Power windows)	(Passenger-side airbag)	0.800806	0.821904	0.725023	0.905367	1.101548	0.066837	1.881955
81	(Passenger-side airbag)	(ABS, Power windows)	0.821904	0.800806	0.725023	0.882126	1.101548	0.066837	1.689891
66	(Air conditioning, ABS)	(Power windows)	0.798370	0.841320	0.737936	0.924303	1.098635	0.066252	2.096262
71	(Power windows)	(Air conditioning, ABS)	0.841320	0.798370	0.737936	0.877117	1.098635	0.066252	1.640830
38	(Side airbag)	(Power windows)	0.783843	0.841320	0.724028	0.923690	1.097906	0.064565	2.079406
39	(Power windows)	(Side airbag)	0.841320	0.783843	0.724028	0.860586	1.097906	0.064565	1.550464
100	(Power windows)	(Air conditioning, Power steering)	0.841320	0.768087	0.708709	0.842378	1.096722	0.062502	1.471322
97	(Air conditioning, Power steering)	(Power windows)	0.768087	0.841320	0.708709	0.922694	1.096722	0.062502	2.052623

The same table sorted by confidence:

rules_ap.sort_values(by = 'confidence', ascending = False)[0:20]

	antecedents	consequents	antecedent support	consequent support	support	confidence	lift	leverage	conviction
90	(Passenger-side airbag, Side airbag)	(ABS)	0.723756	0.891539	0.701440	0.969166	1.087071	0.056183	3.517606
85	(Side airbag, Power windows)	(ABS)	0.724028	0.891539	0.701644	0.969084	1.086979	0.056145	3.508252
13	(Electronic stability control)	(ABS)	0.736619	0.891539	0.713473	0.968578	1.086412	0.056749	3.451762
79	(Passenger-side airbag, Power windows)	(ABS)	0.748709	0.891539	0.725023	0.968365	1.086173	0.057521	3.428512
61	(Side airbag, Power steering)	(ABS)	0.731519	0.891539	0.707683	0.967415	1.085107	0.055505	3.328592
55	(Passenger-side airbag, Power steering)	(ABS)	0.757113	0.891539	0.732066	0.966917	1.084549	0.057070	3.278506
72	(Passenger-side airbag, Air conditioning)	(ABS)	0.742349	0.891539	0.716947	0.965782	1.083275	0.055114	3.169701
50	(Power windows, Power steering)	(ABS)	0.773884	0.891539	0.745344	0.963121	1.080291	0.055396	2.940986
43	(Air conditioning, Power steering)	(ABS)	0.768087	0.891539	0.738366	0.961305	1.078254	0.053587	2.802995
67	(Air conditioning, Power windows)	(ABS)	0.768370	0.891539	0.737936	0.960392	1.077229	0.052904	2.738340
14	(Side airbag)	(ABS)	0.783843	0.891539	0.752470	0.959975	1.076762	0.053644	2.709852
10	(Passenger-side airbag)	(ABS)	0.821904	0.891539	0.786129	0.956473	1.072834	0.053370	2.491798
8	(Immobilizer)	(ABS)	0.735174	0.891539	0.701945	0.954801	1.070959	0.046509	2.399645
1	(Central door lock)	(ABS)	0.761251	0.891539	0.725921	0.953589	1.069600	0.047236	2.336992
7	(Power windows)	(ABS)	0.841320	0.891539	0.800806	0.951846	1.067644	0.050738	2.252371
3	(Power steering)	(ABS)	0.864409	0.891539	0.819335	0.947856	1.063169	0.048681	2.080037
4	(Air conditioning)	(ABS)	0.846205	0.891539	0.798370	0.943471	1.058250	0.043946	1.918691
60	(Side airbag, ABS)	(Power steering)	0.752470	0.864409	0.707683	0.940480	1.088003	0.057241	2.278070
28	(Side airbag)	(Power steering)	0.783843	0.864409	0.731519	0.933247	1.079636	0.053958	2.031233
84	(Side airbag, ABS)	(Power windows)	0.752470	0.841320	0.701644	0.932454	1.108323	0.068576	2.349220