# Economics 420 (sections 2 and 3) Fall Semester 2015 Problem Set #2

Fall Semester 2015
Problem Set #2 (Due: Tuesday, October 6)

Directions: Start each question on a new sheet of paper and staple all pages together. Use a
work processor for Parts I and II. For Part III, either use a word processor or write your answer
clearly and legibly. Assignments turned in unstapled will be returned with a grade of zero. (Only
stapling is acceptable — paper clips and other methods of binding are not acceptable.) Also, if
we cannot discern the meaning of your work, your response will be assumed wrong.
Part I: Hypotheses testing in Stata (7 points total)
This first part of the problem set introduces you to Stata for simple data analysis. For each
question, first copy and paste the Stata output into a word-processing document (use the “copy
This problem set again uses the Stata dataset WAGE2.dta, which you can download fromD2L.
Recall that this dataset contains the following information on monthly earnings, employment
history, education, demographic characteristics, and two test scores for 935 men in year 1980:
wage! ! monthly earnings (in 1976 USD)
hours !! average weekly hours of work
IQ ! ! IQ (intelligence quotient) score
educ ! ! years of education
age ! ! age in years
married =1 if the person is married; 0 othrewise
black ! ! =1 if the person is black; 0 otherwise
1. (1 point) Generate a new variable equal to the natural logarithm of variable wage, and call this
new variable lwage. What is the sample mean log-wage (lwage) for blacks? for non-blacks? Hint:
to generate variable lwage type the following command in Stata: gen lwage = log(wage)
2. (2 points) Test the hypothesis that the population mean log wage is equal for black and nonblack
men. Use a significance level of 5 percent (0.05). What do you conclude and why? Hint:
Use the Stata command ttest, by(var2). In this case, var1 is lwage and var2 is black; so what you
need to type in Stata’s command window is: ttest lwage, by (black).
3. (2 points) Test the hypothesis that the population mean years of education (variable educ) is
equal for black and non-black men. Use a significance level of 5 percent (0.05). What do you
conclude and why?
4. (2 points) Test the hypothesis that the population mean years of education (variable educ) is
equal for married and non-married men (variable married). Again, use a significance level of 5
percent. What do you conclude and why?
Part II: Simple linear regression in Stata (12 points total)
This part of the problem set introduces you to running regressions in Stata.
1. (1 point )Consider a simple linear regression model relating the log of earnings (lwage) to
years of education (educ):
lwage= ?0 + ?1educ + u
What is the meaning of the error term u?
2. (1 point ) Give an example of a variable (or factor) that might be contained in u.
3. (2 points) What is the key condition for ?1to be interpreted as the causal effect of an
additional year of education on lwage? Does this condition hold?
4. (2 points) Use the data in WAGE2.dta to estimate this simple regression model and paste
your results below. To estimate the model in Stata, type: reg lwage educ
5. (2 points) What is the intercept? What is its interpretation?
6. (2 points) What is the slope coefficient? What is its interpretation?
7. (2 points) By how much is the log of earnings expected to change if education increases by 3
years?
Part III: Regression line, fitted values and residuals (5 points total)
You are given the following five data points (observations) with (X; Y) values given in
parentheses: (2; 6), (7; 11), (9; 10), (6; 9) and (1; 4). You run a regression of Y on X and
estimate that the intercept b0 = 4 and the slope b1 = 0.8.
Note: For the questions that ask you to draw something on the graph on the next page,You can
draw all these by hand; there is no need to use drawing software.)
1. (1 points) Graph the data points on the graph on the next page. Note: For all these
questions, you can draw your answer (neatly!) by hand; there is no need to use drawing
software.)
2. (1 points) Draw the regression line y-hat = b0 + b1X on the graph on the next page.
3. (2 points) What are the fitted value and residual for the second observation (7; 11)?
4. (1 points) Indicate (label appropriately) on the graph the fitted value and residual for the
second observation (7; 11).
2
0 2 4 6 8
2
8
6
10
Y
X
3
5
7
1
1 3 5 7 9
9
10
4
Graph to accompany Part III
3