Analysis Example: Boston Houses¶

Statistical Laboratory

Alessandro Ortis - University of Catania

Step 1: feature analysis¶

Data observation and preliminar considerations. The first step is to observe the type and amount of available features, and make hypothesis about possible connections with respect to the nature of the problem to be addressed.

The Boston data frame has 506 rows and 14 columns (predictors). We have descriptions and summaries of predictors as follow:

crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq.ft.
indus: proportion of non-retail business acres per town.
chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox: nitrogen oxides concentration (parts per 10 million).
rm: average number of rooms per dwelling.
age: proportion of owner-occupied units built prior to 1940.
dis: weighted mean of distances to five Boston employment centres.
rad: index of accessibility to radial highways.
tax: full-value property-tax rate per $ 10,000.
ptratio: pupil-teacher ratio by town.
black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
lstat: lower status of the population (percent).
medv: median value of owner-occupied homes in $1000s.

#install.packages("MASS") # only if you don't have the MASS lib installed
library(MASS)
attach(Boston) # after this command all the variables related to Boston dataset are set

Warning message:
"package 'MASS' was built under R version 3.6.3"

# Thanks to the 'attach()' function, I have set one variable for each column of the dataset
length(age)
class(crim)

# Without 'attach()', I should create each single variable as follows...
crim = Boston[,1]
zn = Boston[,2]
# ecc...

To find out more about the dataset we can type '?Boston'.

?Boston

head(Boston)

First, we can summarise the main statistical indices of each variable.

summary(Boston)

      crim                zn             indus            chas        
 Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
 1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
 Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
 Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
 3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
 Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
      nox               rm             age              dis        
 Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
 1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
 Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
 Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
 3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
 Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
      rad              tax           ptratio          black       
 Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
 1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
 Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
 Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
 3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
 Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
     lstat            medv      
 Min.   : 1.73   Min.   : 5.00  
 1st Qu.: 6.95   1st Qu.:17.02  
 Median :11.36   Median :21.20  
 Mean   :12.65   Mean   :22.53  
 3rd Qu.:16.95   3rd Qu.:25.00  
 Max.   :37.97   Max.   :50.00

A more intuitive visualization of the feature distributions is given by the histograms. We are going to visualize both the histogram of each feature and an estimation of its distribution.

To do so, we will combine two commands:

hist(): this function computes and visualizes the histogram of the data. In particular, we will visualize the histogram of the relative frequencies, i.e. the counts are divided by the total. In this way all the values will be between 0 and 1 (normalized). This behaviour is obtained by setting the parameter 'freq' to FALSE (see the documentation for more details).
density(): this function performs a kernel density estimation of the data. The output is a model of the data distribution built by considering a mixture model. This is useful for continuous variables.

hist(nox, freq=FALSE)
lines(density(nox), col="red")

To manage multiple plots we can use the par() function.

par(mfrow=c(2,2))    # set the plotting area into a 2*2 array

# First plot
hist(chas, freq=FALSE)
lines(density(chas), col="red")
# Second plot
hist(crim, freq=FALSE)
lines(density(crim), col="red")
# Third plot
hist(zn, freq=FALSE)
lines(density(zn), col="red")
# Fourth plot
hist(indus, freq=FALSE)
lines(density(indus), col="red")

par(mfrow=c(4,2))    # set the plotting area into a 4*2 array

# First plot
hist(rm, freq=FALSE)
lines(density(rm), col="red")
# Second plot
hist(age, freq=FALSE)
lines(density(age), col="red")
# Third plot
hist(dis, freq=FALSE)
lines(density(dis), col="red")
# Fourth plot
hist(rad, freq=FALSE)
lines(density(rad), col="red")

hist(tax, freq=FALSE)
lines(density(tax), col="red")

hist(ptratio, freq=FALSE)
lines(density(ptratio), col="red")

hist(black, freq=FALSE)
lines(density(black), col="red")

hist(medv, freq=FALSE)
lines(density(medv), col="red")

If we are interested in know more details about a specific distribution, is possible to print the indices related to the density estimation function.

?density

density(nox)

Call:
	density.default(x = nox)

Data: nox (506 obs.);	Bandwidth 'bw' = 0.03002

       x                y           
 Min.   :0.2949   Min.   :0.002129  
 1st Qu.:0.4615   1st Qu.:0.270183  
 Median :0.6280   Median :1.585293  
 Mean   :0.6280   Mean   :1.499679  
 3rd Qu.:0.7945   3rd Qu.:2.566825  
 Max.   :0.9611   Max.   :3.594489

Is possible to perform several statistical inferences over such data. In particular, we can be interested in predicting the Median House Value (medv), considering all or sub-groups of the 13 available features (i.e., predictors).

Then, it worth to give a look at the medv variable.

density(medv)

Call:
	density.default(x = medv)

Data: medv (506 obs.);	Bandwidth 'bw' = 1.542

       x                 y            
 Min.   : 0.3745   Min.   :1.383e-05  
 1st Qu.:13.9373   1st Qu.:3.813e-03  
 Median :27.5000   Median :1.112e-02  
 Mean   :27.5000   Mean   :1.841e-02  
 3rd Qu.:41.0628   3rd Qu.:2.713e-02  
 Max.   :54.6255   Max.   :6.646e-02

With this specific dataset, we maybe interested to model a predictor for the houses' prices (medv). Therefore, start by focusing on the distribution of the medv values.

par(mfrow=c(1,2)) 

hist(medv, freq=FALSE)
lines(density(medv), col="red")

boxplot(medv)
quantile(medv, 0.25)
quantile(medv, 0.75)

We see that the values of medv are distributed normally with few outliers with values between 35 and 55.

In order to select proper predictors, we can first compute the pair-wise correlations.

#install.packages('corrplot')
library(corrplot)

corrplot 0.92 loaded

cov_matrix <- cov(Boston)
cov_matrix

cor_matrix <- cor(Boston)
corrplot(cor_matrix, type = "upper", order = "hclust", 
         tl.col = "black", tl.srt = 45)

The variable 'medv' has an high correlation with the variable 'rm' (average number of rooms per dwelling) and a negative correlation with 'lstat' (lower status of population). Since we are interested in predicting the variable 'medv', consider only the correlations in the 'medv' column of the correlation matrix.

cor_matrix[,'medv']

The correlation values suggest that there may be a negative correlation between 'lstat' and 'medv'. Therefore, we will fit a linear regressor exploiting the predictor 'lstat'.

Step 2: regression analysis (medv)¶

#Note: without the above 'attach()', this would rise an error.
medv_lstat = lm(medv~lstat)

summary(medv_lstat)

Call:
lm(formula = medv ~ lstat)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.168  -3.990  -1.318   2.034  24.500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.55384    0.56263   61.41   <2e-16 ***
lstat       -0.95005    0.03873  -24.53   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared:  0.5441,	Adjusted R-squared:  0.5432 
F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16

# Extract the coefficient of the model
coef(medv_lstat)

# Extract the confidence interval for the coefficient estimates
# (lower and upper confidence limits for each parameter)
confint(medv_lstat)

Note that the 'lstat' confidence interval is close to zero, even if zero is not included.

We can use 'names' to find out the information that can be extracted from the fitted model, and use the '$' operator to extract them.

names(medv_lstat)

medv_lstat$coefficients

Predictions¶

The predict() function can be used to produce confidence intervals and prediction intervals for the prediction of the 'medv' value for a given 'lstat'.

In the following example, we input three values of lstat (5, 10 and 15) and compute the model confidence intervals and prediction intervals.

new_input_data = data.frame(
                        lstat = (c(5,10,15)) )

# confidence intervals
predict(medv_lstat, new_input_data, interval="confidence")

# prediction intervals
predict(medv_lstat, new_input_data, interval="prediction")

For instance, the 95% confidence interval associated with a lstat value of 10 is (24.47, 25.63), and the 95% prediction interval is (12.828, 37.28). As expected, the confidence and prediction intervals are centered around the same point (a predicted value of 25.05 for medv when lstat equals 10), but the latter are substantially wider (in general the prediction interval is wider than the confidence interval).

We will now plot medv and lstat along with the least squares regression line using the plot() and abline() functions.

plot(lstat,medv)
abline(medv_lstat, col="red", lwd=3)

The Boston data set contains thirteen variables, to perform a regression using all of the predictor we can use the following short-hand.

medv_all = lm(medv~., data = Boston)
summary(medv_all)

Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
zn           4.642e-02  1.373e-02   3.382 0.000778 ***
indus        2.056e-02  6.150e-02   0.334 0.738288    
chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
age          6.922e-04  1.321e-02   0.052 0.958229    
dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
black        9.312e-03  2.686e-03   3.467 0.000573 ***
lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,	Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

# Is possible to extract each single output from the summary
names(summary(medv_all))
summary(medv_all)$r.sq
summary(medv_all)$sigma
summary(medv_all)$fstatistic

# is easier if we first assign it to a variable
summ_out= summary(medv_all)
summ_out$r.sq
summ_out$sigma
summ_out$fstatistic

What if we would like to perform a regression using all of the variables but one? For example, in the above regression output, the predictors 'age' and 'indus' have high p-values (i.e., their true coefficients are likely to be zero).

So we may want to run a regression excluding these predictors. The following syntax results in a regression using all predictors except age and indus.

medv_exAgeIndus=lm(medv~.-age-indus, data= Boston)
summary(medv_exAgeIndus)

Call:
lm(formula = medv ~ . - age - indus, data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
crim         -0.108413   0.032779  -3.307 0.001010 ** 
zn            0.045845   0.013523   3.390 0.000754 ***
chas          2.718716   0.854240   3.183 0.001551 ** 
nox         -17.376023   3.535243  -4.915 1.21e-06 ***
rm            3.801579   0.406316   9.356  < 2e-16 ***
dis          -1.492711   0.185731  -8.037 6.84e-15 ***
rad           0.299608   0.063402   4.726 3.00e-06 ***
tax          -0.011778   0.003372  -3.493 0.000521 ***
ptratio      -0.946525   0.129066  -7.334 9.24e-13 ***
black         0.009291   0.002674   3.475 0.000557 ***
lstat        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,	Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16

Step 3: PCA and clustering analysis¶

# Scale the data for PCA (standardize)
boston_scaled = scale(Boston)

# Apply PCA
pca_boston = prcomp(boston_scaled)
names(pca_boston)

# Components (all columns)
head(pca_boston$x)

# Components vs. variables (i.e., variables' coefficients)
pca_boston$rotation

?prcomp

# Plot the first two components
pca_scores = data.frame(pca_boston$x[,1:2])
plot(pca_scores)

data = pca_boston$x[,1:2]

# K-Means Clustering with k clusters
k = 3
fit <- kmeans(data, k)
pairs(data, col=c(1:k)[fit$cluster])

k = 4
fit <- kmeans(data, k)
pairs(data, col=c(1:k)[fit$cluster])

k = 5
fit <- kmeans(data, k)
pairs(data, col=c(1:k)[fit$cluster])

Elbow method¶

The elbow method consists of plotting the average ''sparsity'' of a clustering process (usually computed as a sum of squared differences within clusters - WCSS) with respect to the number of clusters, and picking the ''elbow'' of the function as the best number of clusters to use.

Try to implement the elbow method to determine the best choice for k, representing the number of clusters to be defined by the kmeans model. Then, perform the clustering analysis usign the k value.

Hint: if the analysis is correct, the elbow plot will be a not-increasing function.

names(fit)

max_k = 50

dist_sequence = c()
for(k in seq(max_k)){
    tot_distance = c()
    fit <- kmeans(data, k)
    labels = fit$cluster
    centroids = fit$centers
    for(i in seq(k)){
        cent_x = centroids[i,1]
        cent_y = centroids[i,2]
        
        # within cluster sum of differences
        cluster_data = data[which(labels==i),]
        for (j in seq(nrow(cluster_data))){
            dist = sqrt((cluster_data[j,1] - cent_x)^2 + (cluster_data[j,2] - cent_y)^2)
            tot_distance = c(tot_distance, dist)
        }
    }
    dist_sequence = c(dist_sequence, mean(tot_distance)) # here we compute the mean sum of squared differences of the cluster

}

plot(seq(max_k),dist_sequence, 
     xlab = '# of clusters',
     ylab = 'WCSS',
     type = "l", lty = 1)
# TODO: add also the BCSS to this plot

crim	zn	indus	nox	rm	age	dis	rad	tax	ptratio	black	lstat	medv
0.00632	18	2.31	0.538	6.575	65.2	4.0900	1	296	15.3	396.90	4.98	24.0
0.02731	0	7.07	0.469	6.421	78.9	4.9671	2	242	17.8	396.90	9.14	21.6
0.02729	0	7.07	0.469	7.185	61.1	4.9671	2	242	17.8	392.83	4.03	34.7
0.03237	0	2.18	0.458	6.998	45.8	6.0622	3	222	18.7	394.63	2.94	33.4
0.06905	0	2.18	0.458	7.147	54.2	6.0622	3	222	18.7	396.90	5.33	36.2
0.02985	0	2.18	0.458	6.430	58.7	6.0622	3	222	18.7	394.12	5.21	28.7

	crim	zn	indus	chas	nox	rm	age	dis	rad	tax	ptratio	black	lstat	medv
crim	73.9865782	-40.2159560	23.9923388	-0.122108643	0.419593894	-1.32503785	85.4053223	-6.87672154	46.84776101	844.821538	5.39933079	-302.381816	27.98616788	-30.7185080
zn	-40.2159560	543.9368137	-85.4126481	-0.252925293	-1.396148200	5.11251341	-373.9015482	32.62930405	-63.34869487	-1236.453735	-19.77657066	373.721402	-68.78303690	77.3151755
indus	23.9923388	-85.4126481	47.0644425	0.109668806	0.607073693	-1.88795657	124.5139031	-10.22809746	35.54997135	833.360290	5.69210400	-223.579756	29.58027028	-30.5208228
chas	-0.1221086	-0.2529253	0.1096688	0.064512973	0.002684303	0.01628475	0.6185712	-0.05304296	-0.01629554	-1.523367	-0.06681916	1.131325	-0.09781626	0.4094095
nox	0.4195939	-1.3961482	0.6070737	0.002684303	0.013427636	-0.02460345	2.3859272	-0.18769584	0.61692945	13.046286	0.04739732	-4.020570	0.48894617	-0.4554124
rm	-1.3250378	5.1125134	-1.8879566	0.016284745	-0.024603450	0.49367085	-4.7519292	0.30366342	-1.28381457	-34.583448	-0.54076322	8.215006	-3.07974141	4.4934459
age	85.4053223	-373.9015482	124.5139031	0.618571205	2.385927202	-4.75192919	792.3583985	-44.32937946	111.77084648	2402.690122	15.93692134	-702.940328	121.07772456	-97.5890166
dis	-6.8767215	32.6293041	-10.2280975	-0.053042959	-0.187695836	0.30366342	-44.3293795	4.43401514	-9.06825201	-189.664592	-1.05977455	56.040356	-7.47332906	4.8402286
rad	46.8477610	-63.3486949	35.5499714	-0.016295543	0.616929453	-1.28381457	111.7708465	-9.06825201	75.81636598	1335.756577	8.76071616	-353.276219	30.38544241	-30.5612280
tax	844.8215381	-1236.4537354	833.3602902	-1.523367119	13.046285530	-34.58344778	2402.6901225	-189.66459173	1335.75657653	28404.759488	168.15314053	-6797.911215	654.71451963	-726.2557164
ptratio	5.3993308	-19.7765707	5.6921040	-0.066819160	0.047397324	-0.54076322	15.9369213	-1.05977455	8.76071616	168.153141	4.68698912	-35.059527	5.78272856	-10.1106571
black	-302.3818163	373.7214023	-223.5797555	1.131324541	-4.020569639	8.21500573	-702.9403283	56.04035576	-353.27621939	-6797.911215	-35.05952730	8334.752263	-238.66751554	279.9898338
lstat	27.9861679	-68.7830369	29.5802703	-0.097816264	0.488946168	-3.07974141	121.0777246	-7.47332906	30.38544241	654.714520	5.78272856	-238.667516	50.99475951	-48.4475383
medv	-30.7185080	77.3151755	-30.5208228	0.409409463	-0.455412432	4.49344588	-97.5890166	4.84022864	-30.56122804	-726.255716	-10.11065714	279.989834	-48.44753832	84.5867236

	2.5 %	97.5 %
(Intercept)	33.448457	35.6592247
lstat	-1.026148	-0.8739505

fit	lwr	upr
29.80359	29.00741	30.59978
25.05335	24.47413	25.63256
20.30310	19.73159	20.87461

fit	lwr	upr
29.80359	17.565675	42.04151
25.05335	12.827626	37.27907
20.30310	8.077742	32.52846

PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC9	PC10	PC11	PC12	PC13	PC14
-2.085280	0.4923660	-0.33565863	0.02806062	1.01179950	-0.2618327	0.3275539	-0.16005808	-0.4706823	-0.2055695943	-0.77929505	-0.1095206	-0.4941907	0.24768597
-1.372024	-0.1707548	-0.96500929	0.43197898	0.25439282	0.3034788	0.5585569	0.28836506	-0.1956362	-0.2459951705	-0.27726872	0.5881189	-0.1129447	-0.11300499
-2.374204	0.9131235	-0.08993678	1.12280202	-0.03275573	0.5083969	0.4870519	-0.08240874	0.0541731	-0.1948119933	0.02887660	0.4160065	0.3566767	0.05125674
-2.834974	0.1946773	0.06048549	1.06462885	-0.45987914	0.7133022	0.6227126	-0.23948531	-0.3582658	-0.1557370949	-0.24432275	0.1345659	0.5772376	0.08953590
-2.770174	0.4328708	0.06397874	1.12852010	-0.38180095	0.6552062	0.7038004	0.10252408	-0.4083493	-0.0004210339	0.00775053	0.2213430	0.7776655	0.14816451
-2.298644	-0.3283548	-0.44993218	0.69305853	-0.29988892	0.5822707	0.6480631	-0.13247191	-0.4656342	0.1100695972	-0.48317508	0.3511512	0.4333900	0.02416410

	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC9	PC10	PC11	PC12	PC13	PC14
crim	0.242284451	-0.065873108	0.395077419	-0.100366211	0.004957659	-0.22462703	0.777083366	-0.15740140	0.254211798	-0.071384615	-0.071068781	0.06327612	0.097032312	0.059114176
zn	-0.245435005	-0.148002653	0.394545713	-0.342958421	0.114495002	-0.33574694	-0.274178365	0.38031404	0.382899480	0.245579673	-0.127709065	-0.22112210	-0.132375830	-0.096296807
indus	0.331859746	0.127075668	-0.066081913	0.009626936	-0.022583692	-0.08082495	-0.340273839	-0.17174578	0.627048264	-0.254827026	0.273797614	0.34840828	0.083716854	-0.235472877
chas	-0.005027133	0.410668763	-0.125305293	-0.700406497	-0.535197817	0.16264906	0.074075775	0.03292700	-0.018642967	-0.041706916	-0.009968402	-0.01903975	-0.049917454	0.023488966
nox	0.325193880	0.254276363	-0.046475549	-0.053707583	0.194605570	-0.14899191	-0.198092965	-0.04745838	-0.043024391	-0.211620349	-0.437475550	-0.44909357	0.524974687	0.087649148
rm	-0.202816554	0.434005810	0.353406095	0.293357309	-0.008320481	0.13108056	0.074084938	0.43761566	-0.003666947	-0.526133916	0.223951923	-0.12560554	-0.049893596	0.007190515
age	0.296976574	0.260303205	-0.200823078	0.078426326	0.149750092	-0.06086960	0.118580363	0.58810569	-0.043265822	0.245647942	-0.329630928	0.48633905	-0.051462562	-0.038227027
dis	-0.298169809	-0.359149977	0.157068710	-0.184747787	-0.106219480	0.01162335	-0.104397844	0.12823060	-0.175802196	-0.299412026	-0.114600078	0.49356822	0.552292172	0.047124029
rad	0.303412754	0.031149596	0.418510334	0.051374381	-0.230352185	-0.13493732	-0.137080107	-0.07464872	-0.463439397	0.115793486	0.042213348	0.01863641	-0.006278474	-0.634975332
tax	0.324033052	0.008851406	0.343232194	0.026810695	-0.163425820	-0.18847146	-0.313984433	-0.07099212	-0.179446555	-0.008366413	0.042794054	0.17042179	-0.242987756	0.698822190
ptratio	0.207679535	-0.314623061	0.000399092	0.342036328	-0.615707380	0.27901731	0.001485608	0.28346960	0.274525949	0.160474164	-0.099991841	-0.23214842	0.188347079	0.055738160
black	-0.196638358	0.026481032	-0.361375914	0.201741185	-0.367460674	-0.78590728	0.074842780	0.04444175	-0.060975651	-0.146292237	0.039194858	-0.04152885	-0.021078199	-0.016165280
lstat	0.311397955	-0.201245177	-0.161060336	-0.242621217	0.178358870	-0.09197211	0.083213083	0.35748247	-0.171810921	0.066647267	0.683032690	-0.18189209	0.249489863	0.083143795
medv	-0.266636396	0.444924411	0.163188735	0.180297553	-0.050659893	-0.05402804	-0.009964973	-0.15230879	0.070751083	0.575547284	0.242001064	0.09828580	0.469629324	0.134127182