1 answer

(Referencing problem 6.1 from 'Data Mining for Business Analytics Concepts, Techniques, and Applications in R' Shmueli,...

Question:

(Referencing problem 6.1 from 'Data Mining for Business Analytics Concepts, Techniques, and Applications in R' Shmueli, et.al.) 6.1.d.iii. reads as follows: " Use stepwise regression with the three options (backward, forward, both) to reduce the remaining predictors as follows: Run stepwise on the training set. Choose the top from each stepwise run. Then use each of these models separately to predict the validation set. Compare RMSE, MAPE, and mean error, as well as lift charts. Finally, describe the best model."

For the bolded parts of the book question, I'm not clear what I'm supposed to be doing with these. I did run stepwise regression for the three backward, forward, and both options, but it's not clear to me what in the outputs I use to 'choose the top from each stepwise run'. Insight how to proceed is greatly appreciated here


Answers

When stepwise regression is run, backward forward or both the way it works is the entire list of variables that is fed into the program is taken into account and it returns based on specified stay and/or exit parameters keeps a reduced finalized list of predictors. For example, if there are 100 predictors initially, then the stepwise output may give out top 40 variables which are basically the optimum subset of variables that are explaining the response relatively well enough as opposed to all the predictor variables being used, the objective is reduction of predictor variables for parsimonious model.

Explanation "choose the top form each stepwise run and run each model separately to predict validation set" is as follows:

Let's say if there are 1000 observations/records/rows and 100 variables/predcitors and one response variable. The first step is dividing the data into training and test datasets randomly usually in the ratio 7:3. Training dataset would contain 700 records and test would contain 300 records. Test dataset is to be kept aside for the time being.

Next stepwise for forward backward and both are to be run for the training dataset 700 records.

Then one has to choose the top m variables from each run of stepwise. Say if it isn't prespecified choose say m=30 subjectively, basically the reduced number of predictor variables that you actually want to build the model on. Thus one has 3 lists of variables each for forward backward and both methods of stepwise each containing 30 variables. Now run 3 separate regressions on the 700 records on each of the above variable lists of size 30. Then 3 models are developed.

Next, on the validation /test data of size 300, these models are scored to predict the same for the test dataset, scoring can be done using the same package in the software.

Then corresponding metrics like RMSE MAPE mean error and lift charts can be compared across the test dataset for the 3 models. Based on these, one can do a comparison and choose the best model, lower the RMSE, MAPE and mean error, better the model.

.

Similar Solved Questions

1 answer
What action on November 11, 1918, brought World War I to an end?
What action on November 11, 1918, brought World War I to an end?...
1 answer
C++, data structures Given the following Binary Tree: tree 56 47 69 22 49 59 11...
c++, data structures Given the following Binary Tree: tree 56 47 69 22 49 59 11 29 62 I 23 30 61 64 1. Show the order in which the nodes in the tree are processed by inorder traversal, postorder traversal, and preorder traversal. 2. Show how the tree would look like after deletion of 29, 59 and 47 ...
1 answer
Decreases in equity that represent costs of assets or services used to earn revenues are called:
decreases in equity that represent costs of assets or services used to earn revenues are called:...
1 answer
Umessd.umassonine.net My ackboard Content-Backboa. UMass Dertmauth Quickl.aunch-myUMass Take Test Final Exam-ACT 211-71 The Bueberry Hia Company...
umessd.umassonine.net My ackboard Content-Backboa. UMass Dertmauth Quickl.aunch-myUMass Take Test Final Exam-ACT 211-71 The Bueberry Hia Company pure Present Value Caloulator PV)IU. Remaining Time: 2 hours, 26 minutes, 28 seconds. Question Completion Status: JUS45,800 OD. $42,600 $48,000 OE. QUESTIO...
1 answer
Initial Post Instructions As a medical professional or counselor, there are times when you might encounter...
Initial Post Instructions As a medical professional or counselor, there are times when you might encounter adolescents who are suffering from anorexia or bulimia. Come up with a patient profile including the following: Patient's age, weight, BMI Daily eating behaviors Possible causes of their a...
1 answer
Mitch is a consultant sent in to identify efficiency improvements at Wharton Airlines. His first idea...
Mitch is a consultant sent in to identify efficiency improvements at Wharton Airlines. His first idea is to stop giving away complimentary cocktails, wine, and beer on international flights. After running a scenario analysis, Mitch finds that this will save 3,500,000 free drinks per year with an ave...
1 answer
Return to question 13 Bill Norman comes to you for advice. He has just purchased a...
Return to question 13 Bill Norman comes to you for advice. He has just purchased a large amount of inventory with the terms 2/15, n/45. The amount of the invoice is $325,000. He is currently short of cash but has decent credit. He can borrow the money needed to settle the account payable at an annua...
1 answer
(a) Find the torque the net thrust produces about the center of the cirdle 233Nm (b)...
(a) Find the torque the net thrust produces about the center of the cirdle 233Nm (b) Find the angular acceleration of the airplane when it is in level flight 0043X Your response differs from the correct answer by more than 10%. Double check your calculations. rad/s 0 84 6 5...
1 answer
37. An alcohol has a molecular ion peak at m/z 102. What is a reasonable formula...
37. An alcohol has a molecular ion peak at m/z 102. What is a reasonable formula for the alcohol? (a) C3H 100, (c) C,HO b) CH2 (d) C,H,O 38. How many hydrogen atoms are on the indicated carbon? (a) nonc (b) one (c) two (d) three...
1 answer
How do you graph two complete cycles of #y=-3cos(3/4t)#?
How do you graph two complete cycles of #y=-3cos(3/4t)#?...
1 answer
31. Kyle hes decided to put 525 ore er wosk in his savings hinks bulding his...
31. Kyle hes decided to put 525 ore er wosk in his savings hinks bulding his savingss E. a budget anomaly 32. Jamie McFarland has determined that the value of her liquid assets hor real estate is $128,000, the value of her personal possessions is $82.000, bss her investment assets in $73,000. She ha...
1 answer
The equilibrium constant K for the reaction fructose-1,6-diphosphate, is 8.9 x 10-5 Mat 25°C and we...
The equilibrium constant K for the reaction fructose-1,6-diphosphate, is 8.9 x 10-5 Mat 25°C and we can assume the behavior to be ideal. glyceraldehyde-3-phosphate + dihydroxyacetone phosphate a. Calculate AGº for the process (standard state: 1M) b. Suppose that we have a mixture that is in...
1 answer
And D. (e) Find an 95% confidence interval for μ, the average annual profit per employee...
and D. (e) Find an 95% confidence interval for μ, the average annual profit per employee for retail sales. (Round your answers to two decimal places.) lower limit _______ thousand dollars upper limit _______ thousand dollars Jobs and productivity! How do retail stores rate? One way to ans...
1 answer
Required information The following information applies to the questions displayed below.) On January 1, 2021, Twister...
Required information The following information applies to the questions displayed below.) On January 1, 2021, Twister Enterprises, a manufacturer of a variety of transportable spin rides, Issues $560,000 of 7% bonds, due in 15 years, with interest payable semiannually on June 30 and December 31 each...
1 answer
Question 23 S corporations must allocate income to shareholders based on their proportionate stock ownership. True...
Question 23 S corporations must allocate income to shareholders based on their proportionate stock ownership. True False A sole proprietor is required to use the same reporting period for both business and individual tax information. True False The transferor's holding period for any boot proper...
1 answer
How to write the complete nuclear reaction? Uranium 238 is unstable and spontaneously emits alpha particles
How to write the complete nuclear reaction? Uranium 238 is unstable and spontaneously emits alpha particles...
1 answer
Debate one specific issue in organ donation or transplantation
Debate one specific issue in organ donation or transplantation...
1 answer
Applications of Conversions Between US and Metric Systems Convert the measurements as indicated. Use the given...
Applications of Conversions Between US and Metric Systems Convert the measurements as indicated. Use the given conversion equation. Round your results to two decimal places. Australia is approximately 2515 miles in width. How many kilometers wide is Australia? Preview km Use 1 km 0.6214 mi Angela bo...