Practice: Transformations to Achieve Linearity
General Procedure for Transforming Data
You must always start by looking at a scatterplot of your original data, and examining the pattern. Are there outliers or influential points? What is the shape of the curve? The shape is a guide to choosing a likely transformation. If the points seem to lie in a straight line, you may not need to transform the data at all; you may have a linear relationship.
If the explanatory variable involves time, particularly in years, you may want to change the variable to a form easier to work with. For example, if you were studying the U.S. population during the 1800's, looking for the effect of the Civil War, you might pick 1800 to be "zero." Then figure your explanatory variable in terms of years elapsed since 1800: 1820 becomes 20.
Remember to "Back-Transform" When you Predict y-Values
Remember that, whether you work with your calculator or a spreadsheet, you'll have results expressed simply in x and y. The correct variable for prediction is y^ . And either variable
may actually be transformed (ln, exponential, square, square root, and so on).
Things to Avoid
If your transformation involves taking a logarithm, remember that logarithms are undefined for zero and negative numbers.
Questions 1 through 6 work with the length of the sidereal year vs. distance from the sun. The table of data is shown below.
Planet Distance from sun (in millions of
miles) Years (as a fraction of Earth
years) ln(Dist) ln(Year)
Mercury 36.19 0.2410 3.5889 -1.4229
Venus 67.63 0.6156 4.2140 -0.4851
Earth 93.50 1.0007 4.5380 0.0007
Mars 142.46 1.8821 4.9591 0.6324
Jupiter 486.46 11.8704 6.1871 2.4741
Saturn 893.38 29.4580 6.7950 3.3830
Uranus 1,794.37 84.0100 7.4924 4.4309
Neptune 2,815.19 164.7800 7.9428 5.1046
pluto 3,695.95 248.5400 8.2150 5.5156
Enter the original data in L1 and L2 (that is, the Distance from the Sun and Years). Make L3 = ln(L1) and L4 = ln(L2). Verify that this matches the columns given above. Don’t
worry about the small discrepancies you may find due to rounding and the number of decimal places shown on your calculator. If your results differ from the values above, double-check your original entries!
1. Draw a scatterplot of Distance vs. Year (using the untransformed data) with the least-squares regression line. Does the line seem to model the relationship well? (2 points)
2. On your calculator, do a linear regression (STAT CALC 8) for these different combinations:
• Distance vs. ln(Year) (L1 vs. L4, if you entered the data as directed above)
• Ln(Distance) vs. Year (L3 vs. L2, if you entered the data as directed above)
• Ln(Distance) vs. Ln(Year) (L3 vs. L4, if you entered the data as directed above)
(Note that the explanatory variable is always some form of "Distance.") To get the most out of this Assignment, look at a scatterplot of each of these combinations.
Which transformation yields the highest correlation coefficient (Pearson's r)? sketch a scatterplot of this transformation and show the least-squares line. What is the value of r and r2 for that transformation, and what regression equation does it yield? (3 points)
(Hint: Remember to include "ln" on the variables in your regression equation that have been transformed.)
3. Using the regression equation from the previous question that best fits the data, place the values of the residuals into L5. In case you forgot how to do this:
press , highlight L5, in the data list window and press ENTER, then press [LIST], select REsID, and press
Create a residual plot on your calculator and interpret it; you don't need to draw the plot. (Note: You'll probably need to turn off the plot in Y1 to display the scatterplot correctly.) (2 points)
4. Using algebra, convert your regression e ation to a power equation (show your work below). Enter this equation in Y2 (press lY..=J and enter the equation) and make a
scatterplot of Ll, L2, with Y2, verifying that the power equation is a good fit for this data.
As you set up your regression equation, keep in mind that the variables are lny and lnx.
Here's what the graph of the scatterplot and power equation will look like. (It's up to you to derive the power equation.)
Finally, summarize, in plain English, what you've done in questions 1-4.
(3 points)
5. The purpose of the transformations you're studying is to find a simple model to describe the relationship in a data set. The model can be used to predict a response value (called interpolation for values within the range of the data set and extrapolation for values outside the range of the data set). Recall that extrapolation is usually not a valid way to predict y-values.
A well-known feature of our solar system is the asteroid belt between Mars and Jupiter. One theory about the asteroid belt is that it's made of primordial material that was prevented from forming another planet by the gravitational pull of Jupiter when the solar system was formed. One of the largest asteroids is 951 Gaspra. Its distance from the Sun is 207.16 million miles. Use your linear regression equation to interpolate the length of its sidereal year. (1 point)
Remember that you need to take the natural log of Distance before you plug it in, and that your first result will be the natural log of Year. Show your work.
6. Finally, calculate the length of the year for 951 Gaspra from the power function you developed in Question 4. (Show all your work) (1 point)
Note: Theoretically, the answers from 5 and 6 should be the same, but they'll probably come out differently due to rounding between steps. The more digits you carry throughout the calculations, the closer the two answers will be.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme