**Project: Hazardous Waste**

Name _______________________ Name____________________________

RCRA data: 1991=L1, 1993=L2, 1995=L3, 1997=L4

Population data = L5

Compute "mean RCRA waste" for each county, and store the results in list L6. Transfer the mean values to the data sheet. What are the units of measure for the mean?

** **

**2. Analyze the data
**Find the county with the biggest change in RCRA waste generation from one
biennium to the next: which county, how much waste one year, how much in the
next report? What is the percent change from one biennium to the next?

Such extreme changes in hazard waste production do not seem reasonable, maybe
the numbers are in error. But maybe not! Give one **reasonable explanation**
why RCRA waste generation might change so much in one biennium.

Use your TI-83+ to make a frequency histogram of the mean RCRA waste values. (For review information, consult Chapter 3 in your text.) Sketch the histogram on graph paper. Label axes appropriately.

Are the mean RCRA waste values normally distributed? How can you easily tell without doing any computations?

Compute the mean and standard deviation of the mean RCRA waste values. Use the TI-83+ for assistance.

Is the standard deviation less than, equal to, or greater than the mean?

In your opinion, is the standard deviation "small", "medium" or "large"? Explain briefly.

Compute the following 7 numbers:

Do any of the 7 numbers come out negative? ___________ If so, do these numbers have any physical meaning, can you have negative mean RCRA waste in reality? What do the negative numbers tell you?

Sometimes there are data that seem to be "way out of bounds." These
numbers can be accurate or they can be caused by error. In either case they
tend to dominate the calculations. Statisticians call these numbers **outliers**;
outliers are numbers that lie more than 3 standard deviations away from the
mean. Are there any outliers in your mean RCRA waste values? If so, what are
the names of the counties?

**3. Per Cap Waste
**The EPA hires you as a consultant, to impose fines on counties that are
"environmentally bad." Your supervisor suggests that counties that
generate the most RCRA waste should be fined the most. Discuss why this system
might not be fair.

Another method of fines is to punish the people, not the counties. In other
words, fine the counties that have the highest mean RCRA waste **per capita**
(per person). Compute the mean RCRA waste per capita for each county. Convert
the result so that the units are in **pounds per person**. (Note: 1 ton
= 2000 pounds) Store the final result in L7 and record on the data sheet.

Use your TI-83+ to make a frequency histogram of the per capita mean RCRA waste values. Sketch the histogram on separate graph paper. Label axes appropriately.

What is the mean of the mean per capita RCRA waste? What is the standard deviation? (Use correct symbols when writing values.)

Is the standard deviation large, medium or small compared to the mean?

Measuring spread in skewed data using standard deviation is problematic because
standard deviation is often many times bigger than the mean. Has normalization
by population "improved" the standard deviation of the data? In other
words, is the per capita waste data less skewed than the unnormalized waste
values?

**4. Transform the data
**When data are skewed to the right, we can often make the distribution more
symmetrical by logging the data. Do this now: log the mean per capita RCRA values
for each county, and store the results in list L8. Record the logged values
on your data sheet. Then sketch a frequency histogram of the logged values.
Include units and labels.

How does the histogram of the transformed data (log of the per capita mean
RCRA values) compare to the two histograms that you sketched previously?

Compute the mean and standard deviation for the transformed data. Include units of measure.

Is the standard deviation less than, equal to, or greater than the mean?

Is the standard deviation "small", "medium" or "large", as compared to the mean? Explain briefly.

For the transformed data, calculate the 7 numbers:

Use these 7 numbers to determine if the transformed data are normally distributed. Show work.

**5. Carrots and Sticks**

You have transformed the county data into a distribution that is closer to normal.
Now you come up with the following idea to impose **waste fines**. Based
on the transformed data, impose the highest fines on counties that lie more
than 3 standard deviations above the mean, impose moderate fines on counties
that lie between 2 and 3 standard deviations above the mean, impose small fines
on counties that lie between 1 and 2 standard deviations above the mean, and
very small fines for those counties between the mean and 1 standard deviation
above the mean. On your data sheet, under the column "st. dev. category",
indicate which counties are in the categories: ">3", "2
to 3", "1 to 2", or "0 to 1".

To reward counties that produce the least amount of RCRA waste per person,
you will give **waste credits** that can be sold in the market. On your
data sheet, for those counties whose RCRA wastes are below the mean, mark categories
"<-3", "-3 to -2", "-2 to -1", and "-1
to 0".

Now you get good results with this penalty and reward system. Overall, polluters are given monetary incentives to improve their standard deviation score. In fact, you suggest that all states take up your system. Your boss likes the idea, but she has some questions:

Is it possible that in some state most of the counties would be in the "above
3" or "below -3" categories? This could be seen as politically
"heavy handed", with lots of money flowing back and forth in fines
and credits. What is your answer?

How would this system work with a state like South Dakota , whose mean per
capita RCRA waste is very low? Won't most of the counties in South Dakota be
getting pollution credits?

You've convinced your boss that this system will work, but now she has a third
question. When two counties lie in the same standard deviation category they
are penalized or rewarded the same, even if their mean RCRA waste per capita
numbers are different. Is there some way to refine the rewards and incentives
so that there is a **continuous** scale?

A continuous scale can be based on "z-scores" for each county. A
**z-score** is a number that indicates how many standard deviations each
county lies above or below the mean. Z-scores are computed with the simple
formula:

Here x is each county's logged per capita mean RCRA waste, xbar is the mean
of logged per capita wastes, and s is the standard deviation. The z-scores
are positive if the county lies above the mean, and negative if they lie below.
Fill out the last column on the data sheet with the z-score for each county;
round to 2 decimal places of accuracy.

Your boss thinks your z-score idea is great. She now gives you enough money to impose fines and give credits. She suggests a $100,000 fine or credit per z-score (fines for positive z-scores, credits for negative z-scores). Will your agency lose money, earn money, or break even? Explain in detail.