Homework -5 Multiple Regression

need problem 3Homework #5Multiple RegressionPlease submit on Canvas a separate Excel file for each problem. The hard copy of the homeworkthat you are turning in should include the answers to the questions asked. You can refer to charts and outputs in your electronic files to support your answers, howeverall conclusions should be clearly stated in the hard copy of the homework. Problem 1 (20 points)When potential workers apply for a job that requires extensive manual assembly of smallintricate parts, they are initially given three different tests to measure their manual dexterity. The ones who are hired are then periodically given a performance rating on a 0 to 100 scale thatcombines their speed and accuracy in performing the required assembly operations. The file P11_49. xlsx lists the test scores and performance ratings for a randomly selected groupof 80 employees. It also lists their seniority (months with the company) at the time of theperformance rating. Employee127980Seniority19141711Test165645747Test245586861Test338476144Performance67585654a. Use the Correlation and Covariance option in StatTools or the Correlation option in theData Analysis tool in Excel (or the CORREL function) to calculate a table of correlations: SenioritySeniorityTest1Test2Test3PerformanceTest1Test2Test3Performance1. 0001. 0001. 0001. 0001. 000b. Based only on these correlations, can you say with certainty that the value of the coefficientof determination r 2 for the regression of performance rating on seniority and test scoreswill be at least 35%? Why or why not?c. Is there any evidence in the correlation table that collinearity will be a problem?Why or why not?-1-d. Run the regression of Performance Rating versus all four explanatory variables. List the regression equation and the values of r 2 and s e . e. Do all of the coefficients have the signs (negative or positive) you would expect?Briefly explain. f. If a worker (outside of the 80 in the sample) has 15 months of seniority and test scores of57, 71, and 63, find a prediction for this worker’s Performance rating score. g. One of the t-values for the regression coefficients is less than 1. Explain brieflywhy this occurred. Does it mean that this variable is not related to Performance Rating?h. Find the variance inflation factor VIF corresponding to this variable. What does the value of this VIF tell you?i. Arguably, the three test measures provide overlapping (or redundant) information. For the sake of parsimony (explaining “the most with the least”), it might be sensible to regressPerformance rating versus only two explanatory variables, Seniority and Average Test,where Average Test is the average of the three test scores: Average Test = (Test1 + Test2 + Test3)/3Run this regression and report the equation itself, r 2 and s e . j. Can you argue that this equation is just as good as the equation in part (d)? Explain briefly. -2-Problem 2 (20 points)The following market research study was conducted by a national chain of consumer electronicsstores. To promote sales, the chain relies heavily on local newspaper advertising to support itsmodest exposure in nationwide television commercials. A sample of 20 cities with similarpopulations and monthly sales totals were assigned different newspaper advertising budgets forone month. The file Advertising. xlsx summarizes the sales (in $ millions) and the newspaperadvertising budgets (in $1,000) observed during the study. City121920Sales6. 146. 046. 846. 99Advertising552525a. Construct a scatter plot for advertising and sales. What kind of relationship does it suggest?b. Fit a quadratic regression model. List the regression equation and the values of r 2 and s e . c. Interpret the meaning of the coefficient of determination. d. Predict the monthly sales for a city with newspaper advertising of $20,000. e. Construct a normal probability plot of residuals and a plot of residuals versus fitted values. Are the four assumptions for regression analysis satisfied? Explain. f. At the 0. 05 level of significance, is there a significant overall quadratic relationship betweenmonthly sales and newspaper advertising? Formulate and perform the appropriate hypothesis test. g. At the 0. 05 level of significance, determine whether the quadratic model is a better fit than thelinear model. Formulate and perform the appropriate hypothesis test. h. Run a linear regression. Report the regression equation, r 2 and s e . i. Do your results in (h) seem to support or contradict the conclusion you reached in (g)? Explain. -3-Problem 3 (50 points)The price of advertising is different from one consumer magazine to another. What causes thesedifferences in price? Probably something related to the value of the ad to the advertiser. Magazines that reach more readers (all else equal) should be able to charge more for an ad. Also, magazines that reach a better-paid reading audience should probably be able to charge more. Although there may be other important factors, let’s look at these two together with one more,gender difference, to see if magazines charge more based on the percentage of men or womenamong the readers. Multiple regression will provide some answers and can help explain theimpact of audience size, income, and gender on advertising prices. The file Magazine_Ads. xlsx contains the multivariate data set to be analyzed. The variables are: y– page costs for a “four-color, one-page ad run once”x1 – audience (readers) in thousandsx 2 – percent male among the readershipx3 – median household incomeMagazine124445AAA WestwaysAARP The MagazineWiredWoman’s DayPage Costs($)5331053260099475259960Audience(thousands)874035721278920325Male(%)47. 039. 775. 50. 0Income($)92600589909105658053a. Formulate the multiple regression model. b. Run the multiple regression and report the estimated regression equation. c. Interpret the intercept. Does it make sense in this context?d. Interpret the slope coefficients. e. Below is the correlation matrix. AudienceMaleIncomePage CostsAudienceMaleIncomePage Costs1–0. 215–0. 3770. 85010. 540–0. 1261–0. 1481The correlation between page costs and median income is –0. 148 indicating that highermedian income is actually associated with lower page costs. How can this be?Doesn’t it contradict the interpretation of the regression coefficient for income in (d)?-4-f. The magazine Martha Stewart Living is in the original data set: Magazine25 Martha Stewart LivingPage Costs($)157,700Audience(thousands)11,200Male(%)11. 0Income($ thousands)74,436What is the residual or prediction error for this magazine?How can Martha Stewart Living charge so much less than we would expect?g. Report and interpret the standard error of estimate. h. Interpret the coefficient of determination. i. Is there some predictive relationship between the x variables and the y in the population ofsimilar conceivable magazines? Formulate and perform the appropriate test. j. Could the regression coefficient for percent male reasonably be just randomness?In other words, could it be that the effect of percent male readership on page costs is just arandom number, rather than indicating a systematic relationship?Formulate and perform the appropriate test. k. Use the correlation matrix and scatterplots (one for each pair of variables) to describe thestructure of the relationships among x1 , x2 , x3 , y. l. What does the plot of residuals versus predicted page costs show?Are the regression assumptions challenged?m. Create histograms of the three explanatory variables. Does any of them show skewness?n. Use Excel’s LN function to transform the variable with the highest skewness. What does the histogram of the transformed variable show?o. Run a regression with this transformed variable and the other variables unchanged. Report r 2 and s e . Do you think the log transformation of the variable with highest skewnessimproved the regression results?p. Construct and discuss a plot of residuals versus predicted page costs for the model with thetransformed variable. Can you suggest a remedy for the problems (if any) you see in this plot?q. Transform all of the original variables that measure amounts (that is, page costs, audience,and median income) using natural logarithms. What is the equation of this model?r. Run a regression for this model. Report r 2 and s e . Can we use r 2 and s e to comparedirectly the results from this model with the original model? Why or why not?s. Construct a residual plot and discuss if the log transformations have helped. t. What is the estimated regression equation?Interpret the slope coefficient for log audience. u. Does audience have a significant impact on page costs, holding percent male and medianincome fixed?v. What is the predicted value of page costs for Martha Stewart Living?Compare this value with the value predicted in part (f). Which of the two models provides a better prediction for this magazine?-5-