Introductory Econometrics. Chapter 1

Chapter 1: The Nature of Econometrics and Economic Data

Econometrics is the use of statistical methods to analyze economic data. It is concerned with the systemic study of economic phenomena, estimating economic relationships, confronting economic theory by testing hypotheses, and forecasting the behavior of economic variables using observed data.

Empirical economics provides a numerous examples of attempts to estimate economic relationships. For example, an analyst from both the government and the private sector are interested in estimating the demand/supply of various products; a private firms are interested in estimating the effects on advertising on sales and profits and so on. Many macro economists try to measure the effectiveness of various government policies. Once variables have been identified and effects measures, we may use the estimated relationships to predict future values. For example, firms try to forecast sales, profits, cost of production, and inventory requirements. Utility companies need to predict the energy demand. Many financial economists try to forecast stock market indices and prices of specific stocks. By the end of the semester, you will know what data you need to answer questions like these and what econometric methods you should use to evaluate these questions.

While most biology, physics and other natural sciences use experimental data, econometricians typically do not have experimental data but must rely on observational (also called non-experimental, retrospective) data. In controlled experimental environment, all other variables are held constant and some one experiment can be run many times. The issue with observational data is that while we observe a lot of changes in the variable of interest, we have all other circumstances changing as well (as we do not have controlled environment). We also are not able to observe many key variables.

Common application of econometric techniques are forecasting of macroeconomic variables such as the interest rates, inflation rates, GDP, trade balance, debt and so on. Many economists and econometricians work on other less publicly visible issues such as, for example, effect of schooling on student performance.

Empirical analysis uses data to test a theory or estimate a relationship. Researcher should take the following steps:

A careful formulation of the question of interest.
Formulation of the economic model which describes some relationship.
Formulation of econometric model.
Collection of data.
Estimation of the models, testing of the hypotheses of interest using data.
Using the models for prediction and policy purposes.

A model is a simplified representation of a real-world process. Many scientists argue in favor of simplicity because simple models are easier to understand, communicate ans evaluate empirically. However, others criticize the oversimplification of the models and use of unrealistic assumptions. Friedman argued that assumptions should be sufficiently good to study the question at hand (we do not want to assume so much that the result can only be such that we expect).

Econometric models directly specify the relationship between the independent variables (regressors, covariates, predicting variables) on the right hand side and the dependent variable (regressand) on the left hand side. We choose the econometric model taking into account economic theory and data. In practice, in our model we include all variables that we think are relevant for our purpose. Too often, very important variables are not precisely observed or observed at all. The rest of the variables that we do not see as relevant, appear in the disturbance term (we will discuss what this is more in the next class).

Econometric analysis requires data. This data comes in a variety of types. Each type of data requires a specific type of econometric methods to be used. Inappropriate econometric methods for the data may lead to misleading results. There are several kinds of economic data sets:

Cross section data
Time series data
Pooled cross section data
Panel data (also known as longitudinal data)

Cross sectional data set consists of sample individuals, households, firms, cities, states, countries or other units taken at a given point in time. An important feature of cross sectional data is that we assume it is obtained by random sampling from the underlying population. However, sometimes the random sample assumption is violated if individuals refuse to participate or inappropriate survey method captures only a specific group of the population.

Time series data set consists of observations on variables over time. For example, stock price movement, changes in money supply, consumer price index, GDP, crime rate, automobile sales figures. Time series observations are typically serially correlated, that is they are not independent across time. This requires use of slightly more advanced econometric techniques. Researcher must also pay attention to data frequency at which data is collected. Data may be collected hourly, daily, weekly, monthly, quarterly, yearly and so on. Time series data typically includes trends and seasonality. <> Time series data requires slightly more complex tools because of highly persistent nature of this kind of data.

Pooled cross section is a data set that combines two or more-cross section in one data set. Cross sections must be drawn independently of each other. Cross sections are often used to evaluate policy changes. For example, a cross sectional survey may be done on a random sample of households in one year, and then repeated in a few years on a new random sample. Analysis of a pooled cross section is similar to standard cross section but with some additional controls. A researcher must account for secular differences in variables across time. A key relationship may change over time.

Panel or longitudinal data set consists of a time series for each cross sectional member in the data set. Panel data has both cross sectional and a time series dimension. For example, we observe John Smith’s wage, employment history, education and other variables for every of the ten years. Similarly, we can observe a country’s immigration flows, tax rates, wage rates, government expenditures and so on. The key feature is that the same individual, firm or a country is observed for a few consecutive periods. This allows us to account for time-invariant unobservables. For example, we may not observe individual’s talent but if we follow the same person for a number of years as the individual acquires more education, we may be able to better estimate the effect of education on wages holding the unobservable talent constant. With panel data, we may also introduce lagged variables. For example, if tax rates that individual is facing get changed this year, the person may change their behavior next year. However, panel data is expensive because it requires following up on the same individuals or firms over time. There are a few excellent panel data sets that many economists use. For example, see PSID (Panel Study of Income Dynamics), HRS (Health and Retirement Study), SHARE (Survey of Health, Ageing and Retirement in Europe).

Before any research project, a researcher must define the causal relationship of interest and try to describe how it is possible to design the study to infer the causal relationship.

In econometrics, the goal is to establish causality or causal effect. This, for example, would allow us to say that attending classes increases student’s performance. Most researchers, however, find associations. This is equivalent of saying that class attendance and student performance is associated or correlated without being able to say that one causes the other. To be able to derive causal relationship, we need the notion of ceteris paribus to hold. Ceteris paribus means all other being equal. For example, a student that attends class, ceteris paribus, performs better in exams and assignments. Ceteris paribus in this example means a student with the same ability, prior skills and in every other way identical. Other example: how does the price change affect the quantity demanded. We need to make sure that when the price changes, nothing else (for example, income, prices of related goods etc) changes. Then we can look at only the change in price and the change in quantity demanded.

When experiments are feasible, it is easier to infer causal relationship. For example, (1) choose several one-acre plots of land; (2) randomly assign different amounts of fertilizer to the different plots; (3) compare yields. Experiment works because amount of fertilizer applied is unrelated to other factors influencing crop yields. That is all other factors that influence crop yield such as quality of land, rainfall, presence of parasites, and so on are the same on all one-acre plots.

In most cases, however, experiments are either unfeasible and/or unethical to do. For example, if you are interested in the returns to education, you may ask: “If a person is chosen from the population and given another year of education, by how much will his or her wage increase?” To do an experiment, we would need to randomly assign different amounts of education to groups of people and compare wage outcomes later on. The issue is that the amount of education is related to other factors that influence wages, for example, natural ability or intelligence and we cannot observe. Similarly, we cannot do experiments with policing as different parts of the city are inherently different, and police presence and crime rates are contemporaneously determined.

Another topic that receives a lot of attention is the effect of minimum wage on unemployment. A researcher would ask: “By how much (if at all) will unemployment increase if the minimum wage is increased by a certain amount (holding other things fixed)?” An experiment is not feasible again, because randomly assigning minimum wages would be unethical. Looking at current minimum wages would not lead to causality because level of the minimum wage depends on political and economic factors that also influence unemployment. So if one finds high minimum wage and low unemployment, it may be caused by some other factor.

With econometrics, we can also test various economic and finance hypotheses. For example, in financial economics, the pure expectations hypothesis states that the expected long-term interest rate equals compounded expected short-term interest rates. In other words, it states that if you chose to invest into 2 consecutive three-month T-bills or 1 six-month T-bill (so that both of these investments are of the same length), the interest rate should be the same for both of these investments, holding everything else (like the risk, liquidity, etc) constant.

Homework Problems

Problem 1.
Suppose that you are asked to conduct a study to determine whether smaller class sizes lead to improved student performance of fourth graders.
1. If you could conduct any experiment you want, what would you do? Be specific. 2. More realistically, suppose you can collect observational data on several thousand fourth graders in a given state. You can obtain the size of their fourth-grade class and a standardized test score taken at the end of fourth grade. Why might you expect a negative correlation between class size and test score?
3. Would a negative correlation necessarily show that smaller class sizes cause better performance? Explain.

Computer Exercise C1.
Use the data in wage1 for this exercise.
1. Find the average education level in the sample. What are the lowest and highest years of education?
2. Find the average hourly wage in the sample. Does it seem high or low?
3. The wage data are reported in 1976 dollars. Using the Economic Report of the President (2011 or later), obtain and report the Consumer Price Index (CPI) for the years 1976 and 2010.
4. Use the CPI values from part (iii) to find the average hourly wage in 2010 dollars. Now does the average hourly wage seem reasonable?
5. How many women are in the sample? How many men?

Computer Exercise C7.
The data set in alcohol contains information on a sample of men in the United States. Two key variables are self-reported employment status and alcohol abuse (along with many other variables). The variables employ and abuse are both binary, or indicator, variables: they take on only the values of zero and one.
1. What percentage of the men in the sample report abusing alcohol? What is the employment rate?
2. Consider the group of men who abuse alcohol. What is the employment rate among them?
3. What is the employment rate for the group of men who do not abuse alcohol?
4. Discuss the difference in your answers to parts 2 and 3. Does this allow you to conclude that alcohol abuse cause unemployment?

References

Wooldridge, J. (2019). Introductory econometrics: a modern approach. Boston, MA: Cengage.

Maddala, G. S., & Lahiri, K. (1992). Introduction to econometrics (Vol. 2). New York: Macmillan.

Ramanathan, R. (2002). Introductory econometrics with applications. Fort Worth: Harcourt College Publishers.