Overview

Dataset statistics

Number of variables9
Number of observations4653
Missing cells0
Missing cells (%)0.0%
Duplicate rows712
Duplicate rows (%)15.3%
Total size in memory327.3 KiB
Average record size in memory72.0 B

Variable types

Categorical5
Numeric3
Boolean1

Alerts

Dataset has 712 (15.3%) duplicate rowsDuplicates
EverBenched is highly imbalanced (52.2%)Imbalance
ExperienceInCurrentDomain has 355 (7.6%) zerosZeros

Reproduction

Analysis started2023-10-04 17:49:58.598698
Analysis finished2023-10-04 17:50:00.684643
Duration2.09 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Education
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.5 KiB
Bachelors
3601 
Masters
873 
PHD
 
179

Length

Max length9
Median length9
Mean length8.3939394
Min length3

Characters and Unicode

Total characters39057
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBachelors
2nd rowBachelors
3rd rowBachelors
4th rowMasters
5th rowMasters

Common Values

ValueCountFrequency (%)
Bachelors 3601
77.4%
Masters 873
 
18.8%
PHD 179
 
3.8%

Length

2023-10-04T14:50:00.738311image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-04T14:50:00.827144image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
bachelors 3601
77.4%
masters 873
 
18.8%
phd 179
 
3.8%

Most occurring characters

ValueCountFrequency (%)
s 5347
13.7%
a 4474
11.5%
e 4474
11.5%
r 4474
11.5%
B 3601
9.2%
c 3601
9.2%
h 3601
9.2%
l 3601
9.2%
o 3601
9.2%
M 873
 
2.2%
Other values (4) 1410
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 34046
87.2%
Uppercase Letter 5011
 
12.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 5347
15.7%
a 4474
13.1%
e 4474
13.1%
r 4474
13.1%
c 3601
10.6%
h 3601
10.6%
l 3601
10.6%
o 3601
10.6%
t 873
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
B 3601
71.9%
M 873
 
17.4%
P 179
 
3.6%
H 179
 
3.6%
D 179
 
3.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 39057
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 5347
13.7%
a 4474
11.5%
e 4474
11.5%
r 4474
11.5%
B 3601
9.2%
c 3601
9.2%
h 3601
9.2%
l 3601
9.2%
o 3601
9.2%
M 873
 
2.2%
Other values (4) 1410
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39057
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 5347
13.7%
a 4474
11.5%
e 4474
11.5%
r 4474
11.5%
B 3601
9.2%
c 3601
9.2%
h 3601
9.2%
l 3601
9.2%
o 3601
9.2%
M 873
 
2.2%
Other values (4) 1410
 
3.6%

JoiningYear
Real number (ℝ)

Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.063
Minimum2012
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.5 KiB
2023-10-04T14:50:00.891546image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum2012
5-th percentile2012
Q12013
median2015
Q32017
95-th percentile2018
Maximum2018
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.8633768
Coefficient of variation (CV)0.00092472387
Kurtosis-1.2044253
Mean2015.063
Median Absolute Deviation (MAD)2
Skewness-0.11346207
Sum9376088
Variance3.4721732
MonotonicityNot monotonic
2023-10-04T14:50:00.956525image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2017 1108
23.8%
2015 781
16.8%
2014 699
15.0%
2013 669
14.4%
2016 525
11.3%
2012 504
10.8%
2018 367
 
7.9%
ValueCountFrequency (%)
2012 504
10.8%
2013 669
14.4%
2014 699
15.0%
2015 781
16.8%
2016 525
11.3%
2017 1108
23.8%
2018 367
 
7.9%
ValueCountFrequency (%)
2018 367
 
7.9%
2017 1108
23.8%
2016 525
11.3%
2015 781
16.8%
2014 699
15.0%
2013 669
14.4%
2012 504
10.8%

City
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.5 KiB
Bangalore
2228 
Pune
1268 
New Delhi
1157 

Length

Max length9
Median length9
Mean length7.6374382
Min length4

Characters and Unicode

Total characters35537
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBangalore
2nd rowPune
3rd rowNew Delhi
4th rowBangalore
5th rowPune

Common Values

ValueCountFrequency (%)
Bangalore 2228
47.9%
Pune 1268
27.3%
New Delhi 1157
24.9%

Length

2023-10-04T14:50:01.036197image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-04T14:50:01.116264image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
bangalore 2228
38.3%
pune 1268
21.8%
new 1157
19.9%
delhi 1157
19.9%

Most occurring characters

ValueCountFrequency (%)
e 5810
16.3%
a 4456
12.5%
n 3496
9.8%
l 3385
9.5%
B 2228
 
6.3%
g 2228
 
6.3%
o 2228
 
6.3%
r 2228
 
6.3%
P 1268
 
3.6%
u 1268
 
3.6%
Other values (6) 6942
19.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 28570
80.4%
Uppercase Letter 5810
 
16.3%
Space Separator 1157
 
3.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5810
20.3%
a 4456
15.6%
n 3496
12.2%
l 3385
11.8%
g 2228
 
7.8%
o 2228
 
7.8%
r 2228
 
7.8%
u 1268
 
4.4%
w 1157
 
4.0%
h 1157
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
B 2228
38.3%
P 1268
21.8%
N 1157
19.9%
D 1157
19.9%
Space Separator
ValueCountFrequency (%)
1157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34380
96.7%
Common 1157
 
3.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5810
16.9%
a 4456
13.0%
n 3496
10.2%
l 3385
9.8%
B 2228
 
6.5%
g 2228
 
6.5%
o 2228
 
6.5%
r 2228
 
6.5%
P 1268
 
3.7%
u 1268
 
3.7%
Other values (5) 5785
16.8%
Common
ValueCountFrequency (%)
1157
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35537
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5810
16.3%
a 4456
12.5%
n 3496
9.8%
l 3385
9.5%
B 2228
 
6.3%
g 2228
 
6.3%
o 2228
 
6.3%
r 2228
 
6.3%
P 1268
 
3.6%
u 1268
 
3.6%
Other values (6) 6942
19.5%

PaymentTier
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.5 KiB
3
3492 
2
918 
1
 
243

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4653
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Length

2023-10-04T14:50:01.181296image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-04T14:50:01.253289image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Most occurring characters

ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4653
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common 4653
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4653
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 3492
75.0%
2 918
 
19.7%
1 243
 
5.2%

Age
Real number (ℝ)

Distinct20
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.393295
Minimum22
Maximum41
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.5 KiB
2023-10-04T14:50:01.316844image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile24
Q126
median28
Q332
95-th percentile39
Maximum41
Range19
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.826087
Coefficient of variation (CV)0.16419007
Kurtosis-0.29982315
Mean29.393295
Median Absolute Deviation (MAD)2
Skewness0.90519516
Sum136767
Variance23.291116
MonotonicityNot monotonic
2023-10-04T14:50:01.392500image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
26 645
13.9%
28 630
13.5%
27 625
13.4%
25 418
 
9.0%
24 385
 
8.3%
29 230
 
4.9%
30 220
 
4.7%
37 141
 
3.0%
36 139
 
3.0%
34 136
 
2.9%
Other values (10) 1084
23.3%
ValueCountFrequency (%)
22 49
 
1.1%
23 48
 
1.0%
24 385
8.3%
25 418
9.0%
26 645
13.9%
27 625
13.4%
28 630
13.5%
29 230
 
4.9%
30 220
 
4.7%
31 125
 
2.7%
ValueCountFrequency (%)
41 82
1.8%
40 134
2.9%
39 131
2.8%
38 136
2.9%
37 141
3.0%
36 139
3.0%
35 123
2.6%
34 136
2.9%
33 124
2.7%
32 132
2.8%

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.5 KiB
Male
2778 
Female
1875 

Length

Max length6
Median length4
Mean length4.8059317
Min length4

Characters and Unicode

Total characters22362
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowFemale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male 2778
59.7%
Female 1875
40.3%

Length

2023-10-04T14:50:01.478834image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-04T14:50:01.560554image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
male 2778
59.7%
female 1875
40.3%

Most occurring characters

ValueCountFrequency (%)
e 6528
29.2%
a 4653
20.8%
l 4653
20.8%
M 2778
12.4%
F 1875
 
8.4%
m 1875
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17709
79.2%
Uppercase Letter 4653
 
20.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 6528
36.9%
a 4653
26.3%
l 4653
26.3%
m 1875
 
10.6%
Uppercase Letter
ValueCountFrequency (%)
M 2778
59.7%
F 1875
40.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 22362
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 6528
29.2%
a 4653
20.8%
l 4653
20.8%
M 2778
12.4%
F 1875
 
8.4%
m 1875
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22362
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 6528
29.2%
a 4653
20.8%
l 4653
20.8%
M 2778
12.4%
F 1875
 
8.4%
m 1875
 
8.4%

EverBenched
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.7 KiB
False
4175 
True
478 
ValueCountFrequency (%)
False 4175
89.7%
True 478
 
10.3%
2023-10-04T14:50:01.629196image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

ExperienceInCurrentDomain
Real number (ℝ)

ZEROS 

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9056523
Minimum0
Maximum7
Zeros355
Zeros (%)7.6%
Negative0
Negative (%)0.0%
Memory size36.5 KiB
2023-10-04T14:50:01.685910image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q34
95-th percentile5
Maximum7
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.5582403
Coefficient of variation (CV)0.53627901
Kurtosis-0.96941346
Mean2.9056523
Median Absolute Deviation (MAD)1
Skewness-0.16255594
Sum13520
Variance2.4281129
MonotonicityNot monotonic
2023-10-04T14:50:01.753177image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
2 1087
23.4%
4 931
20.0%
5 919
19.8%
3 786
16.9%
1 558
12.0%
0 355
 
7.6%
7 9
 
0.2%
6 8
 
0.2%
ValueCountFrequency (%)
0 355
 
7.6%
1 558
12.0%
2 1087
23.4%
3 786
16.9%
4 931
20.0%
5 919
19.8%
6 8
 
0.2%
7 9
 
0.2%
ValueCountFrequency (%)
7 9
 
0.2%
6 8
 
0.2%
5 919
19.8%
4 931
20.0%
3 786
16.9%
2 1087
23.4%
1 558
12.0%
0 355
 
7.6%

LeaveOrNot
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.5 KiB
0
3053 
1
1600 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4653
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Length

2023-10-04T14:50:01.931163image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-04T14:50:02.000247image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Most occurring characters

ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4653
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Most occurring scripts

ValueCountFrequency (%)
Common 4653
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4653
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3053
65.6%
1 1600
34.4%

Interactions

2023-10-04T14:50:00.158343image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:49:59.598129image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:49:59.884092image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:50:00.249034image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:49:59.707767image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:49:59.974089image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:50:00.337492image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:49:59.799104image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
2023-10-04T14:50:00.070770image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Correlations

2023-10-04T14:50:02.061086image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
JoiningYearAgeExperienceInCurrentDomainEducationCityPaymentTierGenderEverBenchedLeaveOrNot
JoiningYear1.0000.008-0.0380.2140.2010.2670.1500.1310.417
Age0.0081.000-0.1420.0180.0270.0000.0000.0230.066
ExperienceInCurrentDomain-0.038-0.1421.0000.1180.0520.0260.0000.0000.039
Education0.2140.0180.1181.0000.3160.1830.0080.0560.146
City0.2010.0270.0520.3161.0000.2950.2140.0210.209
PaymentTier0.2670.0000.0260.1830.2951.0000.2750.0090.269
Gender0.1500.0000.0000.0080.2140.2751.0000.0120.220
EverBenched0.1310.0230.0000.0560.0210.0090.0121.0000.076
LeaveOrNot0.4170.0660.0390.1460.2090.2690.2200.0761.000

Missing values

2023-10-04T14:50:00.454474image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-04T14:50:00.572497image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

EducationJoiningYearCityPaymentTierAgeGenderEverBenchedExperienceInCurrentDomainLeaveOrNot
0Bachelors2017Bangalore334MaleNo00
1Bachelors2013Pune128FemaleNo31
2Bachelors2014New Delhi338FemaleNo20
3Masters2016Bangalore327MaleNo51
4Masters2017Pune324MaleYes21
5Bachelors2016Bangalore322MaleNo00
6Bachelors2015New Delhi338MaleNo00
7Bachelors2016Bangalore334FemaleNo21
8Bachelors2016Pune323MaleNo10
9Masters2017New Delhi237MaleNo20
EducationJoiningYearCityPaymentTierAgeGenderEverBenchedExperienceInCurrentDomainLeaveOrNot
4643Bachelors2013Bangalore331FemaleNo50
4644Bachelors2015Pune332FemaleYes11
4645Masters2017Pune231FemaleNo20
4646Bachelors2013Bangalore325FemaleNo30
4647Bachelors2016Pune330MaleNo20
4648Bachelors2013Bangalore326FemaleNo40
4649Masters2013Pune237MaleNo21
4650Masters2018New Delhi327MaleNo51
4651Bachelors2012Bangalore330MaleYes20
4652Bachelors2015Bangalore333MaleYes40

Duplicate rows

Most frequently occurring

EducationJoiningYearCityPaymentTierAgeGenderEverBenchedExperienceInCurrentDomainLeaveOrNot# duplicates
78Bachelors2013Bangalore326MaleNo4032
367Bachelors2016Bangalore326MaleNo4028
11Bachelors2012Bangalore326MaleNo4026
153Bachelors2014Bangalore325MaleNo3024
157Bachelors2014Bangalore326MaleNo4024
161Bachelors2014Bangalore327MaleNo5024
440Bachelors2017Bangalore327MaleNo5024
81Bachelors2013Bangalore327MaleNo5021
150Bachelors2014Bangalore324MaleNo2020
261Bachelors2015Bangalore326MaleNo4020