Overview

Dataset statistics

Number of variables11
Number of observations7271
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory596.6 KiB
Average record size in memory84.0 B

Variable types

NUM6
CAT5

Warnings

Make has constant value "7271" Constant
df_index has unique values Unique

Reproduction

Analysis started2021-01-26 09:56:17.888690
Analysis finished2021-01-26 09:56:25.568335
Duration7.68 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct7271
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180241.9017
Minimum168665
Maximum259684
Zeros0
Zeros (%)0.0%
Memory size56.8 KiB
2021-01-26T10:56:25.741926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum168665
5-th percentile169077.5
Q1170645.5
median176773
Q3179260.5
95-th percentile255359
Maximum259684
Range91019
Interquartile range (IQR)8615

Descriptive statistics

Standard deviation20368.35937
Coefficient of variation (CV)0.1130056839
Kurtosis9.132326272
Mean180241.9017
Median Absolute Deviation (MAD)5066
Skewness3.173050123
Sum1310538867
Variance414870063.5
MonotocityStrictly increasing
2021-01-26T10:56:26.112934image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1699811< 0.1%
 
1795811< 0.1%
 
1774761< 0.1%
 
1795251< 0.1%
 
1692881< 0.1%
 
1713371< 0.1%
 
1795331< 0.1%
 
1692961< 0.1%
 
1713451< 0.1%
 
1774921< 0.1%
 
Other values (7261)726199.9%
 
ValueCountFrequency (%) 
1686651< 0.1%
 
1686681< 0.1%
 
1686691< 0.1%
 
1686701< 0.1%
 
1686711< 0.1%
 
ValueCountFrequency (%) 
2596841< 0.1%
 
2596831< 0.1%
 
2596811< 0.1%
 
2596801< 0.1%
 
2596781< 0.1%
 

Make
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size56.8 KiB
BMW
7271 
ValueCountFrequency (%) 
BMW7271100.0%
 
2021-01-26T10:56:26.256549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-01-26T10:56:26.332347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:26.413139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3

Model
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size56.8 KiB
320
930 
335
893 
120
857 
118
850 
325
846 
Other values (13)
2895 
ValueCountFrequency (%) 
32093012.8%
 
33589312.3%
 
12085711.8%
 
11885011.7%
 
32584611.6%
 
1146759.3%
 
1354576.3%
 
1254165.7%
 
1402873.9%
 
3282623.6%
 
Other values (8)79811.0%
 
2021-01-26T10:56:26.539791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-01-26T10:56:26.680925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3

First Registration
Real number (ℝ≥0)

Distinct158
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.866983
Minimum2005.083333
Maximum2020.916667
Zeros0
Zeros (%)0.0%
Memory size56.8 KiB
2021-01-26T10:56:26.845555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2005.083333
5-th percentile2006.083333
Q12008.083333
median2012.083333
Q32015.083333
95-th percentile2019.083333
Maximum2020.916667
Range15.83333333
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.135433195
Coefficient of variation (CV)0.002055520186
Kurtosis-1.032717917
Mean2011.866983
Median Absolute Deviation (MAD)4
Skewness0.1351936226
Sum14628284.83
Variance17.10180771
MonotocityNot monotonic
2021-01-26T10:56:27.016646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2013.0833336228.6%
 
2012.0833336188.5%
 
2011.0833336038.3%
 
2007.0833336018.3%
 
2008.0833335727.9%
 
2006.0833335197.1%
 
2014.0833334656.4%
 
2010.0833333865.3%
 
2017.0833333815.2%
 
2009.0833333404.7%
 
Other values (148)216429.8%
 
ValueCountFrequency (%) 
2005.0833333184.4%
 
2005.1666671< 0.1%
 
2005.3333331< 0.1%
 
2005.4166673< 0.1%
 
2005.51< 0.1%
 
ValueCountFrequency (%) 
2020.9166671< 0.1%
 
2020.251< 0.1%
 
2020.083333861.2%
 
20202< 0.1%
 
2019.9166671< 0.1%
 

Fuel
Categorical

Distinct19
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size56.8 KiB
Diesel (Particulate Filter)
2521 
Gasoline
1951 
Diesel
1304 
Super 95
1095 
Super 95
 
150
Other values (14)
 
250
ValueCountFrequency (%) 
Diesel (Particulate Filter)252134.7%
 
Gasoline195126.8%
 
Diesel130417.9%
 
Super 95109515.1%
 
Super 95 1502.1%
 
Gasoline (Particulate Filter)580.8%
 
Super 95 (Particulate Filter)550.8%
 
Super Plus 98400.6%
 
Regular320.4%
 
Super E10 95240.3%
 
Other values (9)410.6%
 
2021-01-26T10:56:27.184707image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2021-01-26T10:56:27.324842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length33
Median length8
Mean length14.65424288
Min length3

Mileage
Real number (ℝ≥0)

Distinct3608
Distinct (%)49.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean136168.7662
Minimum0
Maximum2309864
Zeros1
Zeros (%)< 0.1%
Memory size56.8 KiB
2021-01-26T10:56:27.471452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile17414
Q176200
median128329
Q3190000
95-th percentile271335
Maximum2309864
Range2309864
Interquartile range (IQR)113800

Descriptive statistics

Standard deviation88815.86372
Coefficient of variation (CV)0.6522484282
Kurtosis107.9315689
Mean136168.7662
Median Absolute Deviation (MAD)56329
Skewness5.212965307
Sum990083099
Variance7888257649
MonotocityNot monotonic
2021-01-26T10:56:27.682887image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
150000580.8%
 
200000550.8%
 
170000450.6%
 
180000430.6%
 
130000400.6%
 
190000390.5%
 
100000360.5%
 
220000360.5%
 
230000350.5%
 
140000350.5%
 
Other values (3598)684994.2%
 
ValueCountFrequency (%) 
01< 0.1%
 
150.1%
 
53< 0.1%
 
10140.2%
 
112< 0.1%
 
ValueCountFrequency (%) 
23098641< 0.1%
 
20500001< 0.1%
 
20170001< 0.1%
 
11458501< 0.1%
 
5550001< 0.1%
 

Power(hp)
Real number (ℝ≥0)

Distinct111
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean208.4381791
Minimum30
Maximum587
Zeros0
Zeros (%)0.0%
Memory size56.8 KiB
2021-01-26T10:56:28.030962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum30
5-th percentile102
Q1150
median190
Q3286
95-th percentile340
Maximum587
Range557
Interquartile range (IQR)136

Descriptive statistics

Standard deviation75.61821862
Coefficient of variation (CV)0.3627848745
Kurtosis-0.7555755424
Mean208.4381791
Median Absolute Deviation (MAD)47
Skewness0.4313988225
Sum1515554
Variance5718.114988
MonotocityNot monotonic
2021-01-26T10:56:28.270834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
21877810.7%
 
3066779.3%
 
1844976.8%
 
1434085.6%
 
1633995.5%
 
1773835.3%
 
1023404.7%
 
2043094.2%
 
3402853.9%
 
1222543.5%
 
Other values (101)294140.4%
 
ValueCountFrequency (%) 
301< 0.1%
 
952203.0%
 
971091.5%
 
1023404.7%
 
1031< 0.1%
 
ValueCountFrequency (%) 
5871< 0.1%
 
5701< 0.1%
 
4531< 0.1%
 
4501< 0.1%
 
4351< 0.1%
 

Price
Real number (ℝ≥0)

Distinct1761
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15300.70375
Minimum199
Maximum104611
Zeros0
Zeros (%)0.0%
Memory size56.8 KiB
2021-01-26T10:56:28.517175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum199
5-th percentile3250
Q17100
median12880
Q319900
95-th percentile35950
Maximum104611
Range104412
Interquartile range (IQR)12800

Descriptive statistics

Standard deviation11803.78077
Coefficient of variation (CV)0.7714534548
Kurtosis8.96096765
Mean15300.70375
Median Absolute Deviation (MAD)6130
Skewness2.272503496
Sum111251417
Variance139329240.5
MonotocityNot monotonic
2021-01-26T10:56:28.699393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3500921.3%
 
4500811.1%
 
9500680.9%
 
12500520.7%
 
4000490.7%
 
8990480.7%
 
14990440.6%
 
4900440.6%
 
17990430.6%
 
11500430.6%
 
Other values (1751)670792.2%
 
ValueCountFrequency (%) 
1991< 0.1%
 
2191< 0.1%
 
2292< 0.1%
 
2351< 0.1%
 
2392< 0.1%
 
ValueCountFrequency (%) 
1046111< 0.1%
 
1043181< 0.1%
 
1033131< 0.1%
 
1032401< 0.1%
 
1026211< 0.1%
 

Body
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size56.8 KiB
Sedans
4123 
Station wagon
1464 
Coupe
704 
Convertible
530 
Compact
424 
Other values (3)
 
26
ValueCountFrequency (%) 
Sedans412356.7%
 
Station wagon146420.1%
 
Coupe7049.7%
 
Convertible5307.3%
 
Compact4245.8%
 
Other240.3%
 
Van1< 0.1%
 
Off-Road/Pick-up1< 0.1%
 
2021-01-26T10:56:28.932786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2 ?
Unique (%)< 0.1%
2021-01-26T10:56:29.019553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:29.368130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length16
Median length6
Mean length7.733049099
Min length3

Gearing Type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size56.8 KiB
Manual
3891 
Automatic
3299 
Semi-automatic
 
81
ValueCountFrequency (%) 
Manual389153.5%
 
Automatic329945.4%
 
Semi-automatic811.1%
 
2021-01-26T10:56:29.527705image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-01-26T10:56:29.621461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:29.743127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length6
Mean length7.450281942
Min length6

Displacement
Real number (ℝ≥0)

Distinct35
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2280.394856
Minimum0
Maximum3999
Zeros6
Zeros (%)0.1%
Memory size28.4 KiB
2021-01-26T10:56:29.993458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1598
Q11995
median1995
Q32979
95-th percentile2998
Maximum3999
Range3999
Interquartile range (IQR)984

Descriptive statistics

Standard deviation528.0380503
Coefficient of variation (CV)0.2315555347
Kurtosis-1.138090311
Mean2280.394856
Median Absolute Deviation (MAD)3
Skewness0.3370945922
Sum16580751
Variance278824.1826
MonotocityNot monotonic
2021-01-26T10:56:30.334061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%) 
1995321044.1%
 
297984311.6%
 
29936719.2%
 
15985777.9%
 
29984786.6%
 
29963775.2%
 
19973124.3%
 
19982082.9%
 
14961842.5%
 
24971812.5%
 
Other values (25)2303.2%
 
ValueCountFrequency (%) 
060.1%
 
11951< 0.1%
 
14961842.5%
 
14991291.8%
 
15681< 0.1%
 
ValueCountFrequency (%) 
39991< 0.1%
 
30012< 0.1%
 
3000140.2%
 
29992< 0.1%
 
29984786.6%
 

Interactions

2021-01-26T10:56:19.096812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:19.240003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:19.623624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:19.748289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:19.880935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.017570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.155202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.299819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.481342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.632439image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.778563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:20.943664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.106768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.239412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.393003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.531641image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.723119image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:21.859753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.035283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.203241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.398406image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.543526image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.691643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.828274image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:22.976877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.131464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.287048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.422721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.561350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.697985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.839613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:23.991215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:24.150811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:24.292426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:24.552735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:24.722281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-01-26T10:56:30.539513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-26T10:56:30.782630image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-26T10:56:31.047920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-26T10:56:31.300246image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-26T10:56:31.549777image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-26T10:56:25.101996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-01-26T10:56:25.418738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexMakeModelFirst RegistrationFuelMileagePower(hp)PriceBodyGearing TypeDisplacement
0168665BMW1142012.083333Gasoline136000.0102.06900.0SedansManual1598
1168668BMW1142012.083333Diesel197354.097.06999.0CompactManual1598
2168669BMW1142013.083333Gasoline90659.0102.010750.0CompactManual1598
3168670BMW1142013.083333Diesel109203.097.012990.0SedansManual1598
4168671BMW1142013.083333Diesel186000.095.06999.0SedansManual1598
5168672BMW1142013.083333Diesel (Particulate Filter)212000.095.07500.0SedansManual1598
6168673BMW1142013.083333Diesel (Particulate Filter)180000.095.07850.0SedansManual1598
7168674BMW1142013.083333Super 95139106.0102.07980.0SedansManual1598
8168675BMW1142013.083333Diesel (Particulate Filter)113580.095.08495.0SedansManual1598
9168676BMW1142012.083333Super 9570000.0102.08790.0SedansManual1598

Last rows

df_indexMakeModelFirst RegistrationFuelMileagePower(hp)PriceBodyGearing TypeDisplacement
7261259670BMW3182017.500000Super 9544630.0136.017220.0Station wagonAutomatic1499
7262259671BMW3282013.500000Super 9586000.0245.017490.0Station wagonAutomatic1997
7263259673BMW3182018.333333Diesel (Particulate Filter)35497.0150.017950.0Station wagonManual1995
7264259675BMW3202016.583333Diesel (Particulate Filter)68739.0190.017990.0Station wagonAutomatic1995
7265259677BMW3182017.250000Super 9527076.0136.018920.0SedansAutomatic1499
7266259678BMW3202017.750000Diesel (Particulate Filter)58691.0190.019780.0SedansManual1995
7267259680BMW3202017.750000Diesel (Particulate Filter)149800.0190.019850.0Station wagonAutomatic1995
7268259681BMW3302018.000000Super 9583638.0252.019990.0SedansAutomatic1998
7269259683BMW3302016.000000Diesel89800.0258.021490.0Station wagonAutomatic2993
7270259684BMW3302016.333333Diesel (Particulate Filter)103488.0258.021900.0SedansAutomatic2993