Modeling the Proportion of a Driver Finishing First, Second, or Third in a Formula 1 Race versus the Starting Position of the Driver

 

 

 

A Study by Henry Marcil (2007)

Buckingham Browne & Nichols School

Cambridge, MA
May 2005

 

Table of Contents

 

 

Abstract....................................................................................... 3

What is Formula 1 Racing?.............................................................................................. 4

The Purpose..................................................................................................................... 5

Predictions....................................................................................................................... 5

Sampling and Data Collection................................................. 7

Data Observation.............................................................................................................. 7

Exceptional Data.............................................................................................................. 8

Determining Models........................................................................................................ 8

Data.............................................................................................. 10

Presentation.................................................................................................................... 10

Scatterplots..................................................................................................................... 12

Explanations................................................................................................................... 12

Drivers Finishing First............................................................................................... 12

Drivers Finishing Second........................................................................................... 13

Drivers Finishing Third.............................................................................................. 14

Drivers Making Podium............................................................................................. 14

Finding Appropriate Models................................................... 16

Drivers Finishing First................................................................................................... 16

Exponential Regression.............................................................................................. 16

Quadratic Regression................................................................................................. 17

Quartic Regression..................................................................................................... 17

Drivers Finishing Second............................................................................................... 19

Quadratic Regression................................................................................................. 19

Quartic Regression..................................................................................................... 20

Drivers Finishing Third.................................................................................................. 21

Cubic Regression....................................................................................................... 21

Quartic Regression..................................................................................................... 21

Drivers Making Podium................................................................................................. 23

Exponential Regression.............................................................................................. 23

Quadratic Regression................................................................................................. 23

Conclusion..................................................................................................................... 25

Weaknesses................................................................................................................... 26

Interpolation................................................................................................................... 27


Abstract

            In Formula 1 racing it is considered an incredible feat to start a race from a position near the back and finish the race in first, second, or third place. With individual Grand Prix’s consisting of about 60 laps, it is not only a sport where it is extremely difficult to finish the race, but in the process pass other cars. Therefore it would be interesting to observe how far back is too far back for a driver to start in a Formula 1 race if the driver has a chance to place first, second, or third in the race.

            The purpose of this study was to examine the relationship between the probability that a driver will make podium (1st place, 2nd place, or 3rd place) in a Formula 1 race given the starting position of the driver. This study will be useful to drivers competing in the Formula 1 circuit by helping them determine their chances of placing in the race given how far back they start.

            Data for this study was taken from every Formula 1 race beginning with the Sao Paulo Grand Prix in Brazil in 1994 and ending with the Barcelona Grand Prix in Spain in 2005. Four different scatterplots were constructed using this data, and a variety of functions were examined in order to find the appropriate model for each of the four scatterplots.

            It was determined that the relationship between starting position of a driver and the probability that driver will finish first, second, or third can each be explained by quartic regression functions. It was also determined that the relationship between starting position of a diver and probability that driver will make podium can be explained by a quadratic regression function.
Description of the Study

What is Formula 1 Racing?

Formula 1 racing, otherwise known as Grand Prix racing, is the highest class of single-seat open-wheel automobile racing in the world. Every year, there are approximately 20 (19 in 2005) races that take place in different countries all around the world. Over the course of a Formula 1 race year, each driver fights for the World Drivers Championship while each car designer fights for the World Constructing Championship. It is one of the most expensive sports in the world, with teams spending hundreds of millions of dollars annually to compete. Because of this, a winning driver and constructor will make millions of dollars per race and many millions of dollars over the entire race year. Because of the large sums of prize money that can be won, there is a huge desire for the drivers and car constructors to do well in each race.

Because Formula 1 regulations do not allow each car to be lined up equally at the start of a race, the drivers race in two qualifying rounds where they fight for starting position. In a qualifying round, each driver races for one lap with no other drivers present on the track, and the lap time for the driver is recorded when they finish. The driver then races in another qualifying round, and the times of both laps are added together to get an aggregate time.

The driver with the fastest aggregate time receives first position (also known as pole position) while the driver with the second fastest time receives second position, and so on. Obviously, each driver wants to win pole position so that they are in the front of pack when the race starts. Also, if a driver has a starting position near the end of the pack, he has a less likely chance of placing first, second, or third. This is because of how difficult it is to pass other cars in Formula 1 racing, as each track is more narrow and twisting than tracks in other racing associations (i.e. NASCAR).

The Purpose

The purpose of this study was to examine how far is too far back to start a race by examining the relationship between the probability that a driver will make podium (1st place, 2nd place, or 3rd place) in a Formula 1 race given the starting position of the driver. This study will be useful to drivers competing in the Formula 1 circuit by helping them determine their chances of placing given how far back they are at the start of the race.

Predictions

Because making podium is a very important thing for a Formula 1 driver to do, I examined the relationship between the starting position of a driver and their chances of taking either first, second or third place in the race. For this study, I predicted that if a driver begins in one of the first 4 grids (starting positions), they will have a much better chance of getting first place than a driver that starts further back. When it comes to taking second place, I thought that there would be a better chance for the drivers starting in the first 6 grids. When taking third place, I predicted that drivers starting in the first 9 grids will have the best chance of winning the race. Obviously, I feel that there will be a greater chance that a driver starting in pole position will win than a driver starting in second position, and that a driver starting in second position will have a greater chance of winning than a driver starting in third position, and so on.

I also predicted that any driver that begins further behind any of these positions will have a significantly less chance of making podium than a driver that starts in my predicted range. I feel that there is a starting grid that is too far back and that any driver in this position will have a near impossible chance of passing all the drivers in front of him in order to make podium.


Sampling and Data Collection

Data Observation

 

Data for this study was observed from every Formula 1 competition raced beginning with the Sao Paulo Grand Prix in Brazil in 1994 and ending with the Barcelona Grand Prix in Spain in 2005. The year 1994 was chosen because this is when the Formula 1 Association banned all forms of “driver’s aids”, which include traction control, active transition, and other automatic adjustment mechanisms. Because of this, there may be differences in results if we were to take data from before 1994. Also, because of how much automobile technology changes year to year, it would difficult to get consistent results if we were to go much further past 1994.

I began with 3 sheets of paper, the first titled “Starting Grid of Driver’s Taking First Place,” the second titled “Starting Grid of Driver’s Taking Second Place,” the third titled “Starting Grid of Driver’s Taking Third Place.” Each sheet of paper had 25 columns representing 25 grids (the maximum number of cars in a race between the years 1994 and 2005 is 25).

I began by taking data for the “Starting Grid of Drivers Finishing First” by looking at the archived results for the year 1994 and going through each race of that year. For each race I observed who won the race and which grid the driver started in, and I then placed a tally next to the corresponding column on the sheet. I repeated this for every race and for every year up until 2005. I then took data for the “Starting Grid of Drivers Finishing Second” and did the same thing as I did with the first sheet, except I tallied the starting grid of a driver that comes in second place. I repeated this process for the third sheet as well.

When I collected all the data, I added up how many tallies there were in each column for each of the three sheets of paper. I then had a fourth sheet of paper titled "Starting Grid of Drivers Making Podium" with 25 columns like the other three sheets. For each of these columns, I added up how many tallies there were for a driver starting in that position finishing first, second, and third and then recorded the sum next to the corresponding grid number. This sum shows how many drivers who started in this certain position finished first, second or third since the Sao Paolo Grand Prix in 1994.

Exceptional Data

         I cannot perceive of any exceptional data in this study. Shortened races were not deleted because they are in response to a random incident such as the weather, track conditions, and accidents that can happen at any time during any race. Randomness is a major part of Formula 1 racing, and therefore events such as these were still observed. Also, there has not been an accident that has resulted in a death since the year 1993 so races being cancelled because of death were not considered in this study.

            Furthermore, there has never been a Formula 1 race where less than 3 cars finished, thus there were no cases where a race had no one placing either third or second or even first. Lastly, there has never been a race in Formula 1 where a car finished in two different places (i.e. first and second place).

Determining Models

After the data was recorded, I counted up how many tallies I had for each column and then counted up how many tallies I had in total. The probability of winning a race given the starting position is equal to the number of tallies in each column divided by the total number of tallies.

Using all the probabilities calculated, I generated four scatterplots: the first with the probability of winning a race as the dependent variable and the starting grid of the driver as the independent variable, the second with the probability of placing second in a race as the dependent variable and starting grid as the independent variable, the third with the probability of placing third in a race as the dependent variable and starting grid as the independent variable, and the fourth with the probability of making podium as the dependent variable and the starting grid as the independent variable. I then used statistical analysis to determine the appropriate model for each scatterplot.

According to my predictions, I expect each model to have a decreasing proportion of driver’s placing first, second, or third as the starting grid of the driver increases. I also expect that each relationship will not be linear because I predicted that there is a starting grid that is too far back for a driver to pass all the other cars ahead of him and make podium in the race.


Data

Presentation

Starting Grid of Drivers Starting Grid of Drivers

Finishing First         Finishing Second

Grid

Counts

Probability

1

79

0.420

2

51

0.271

3

30

0.160

4

8

0.043

5

6

0.032

6

3

0.016

7

3

0.016

8

2

0.011

9

0

0

10

1

0.005

11

1

0.005

12

0

0

13

0

0

14

2

0.011

15

0

0

16

1

0.005

17

0

0

18

1

0.005

19

0

0

20

0

0

21

0

0

22

0

0

23

0

0

24

0

0

25

0

0

Grid

Counts

Probability

1

33

0.176

2

38

0.202

3

35

0.186

4

24

0.128

5

11

0.059

6

13

0.069

7

9

0.048

8

7

0.037

9

2

0.011

10

5

0.027

11

3

0.016

12

4

0.021

13

2

0.011

14

2

0.011

15

0

0

16

0

0

17

0

0

18

0

0

19

0

0

20

0

0

21

0

0

22

0

0

23

0

0

24

0

0

25

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Total Counts = 188 Total Counts = 188

 

 

Starting Grid of Drivers Starting Grid of Drivers

Finishing Third Finishing Podium

Grid

Counts

Proportion

1

12

0.069

2

17

0.090

3

30

0.160

4

29

0.154

5

24

0.128

6

22

0.117

7

14

0.074

8

13

0.069

9

7

0.037

10

7

0.037

11

2

0.011

12

2

0.011

13

4

0.021

14

1

0.005

15

4

0.021

16

0

0

17

0

0

18

0

0

19

0

0

20

0

0

21

0

0

22

0

0

23

0

0

24

0

0

25

0

0

Grid

Counts

Proportion

1

124

0.660

2

106

0.564

3

95

0.505

4

61

0.324

5

41

0.218

6

38

0.202

7

26

0.138

8

22

0.117

9

9

0.048

10

13

0.069

11

6

0.032

12

6

0.032

13

6

0.032

14

5

0.027

15

4

0.021

16

1

0.005

17

0

0

18

1

0.005

19

0

0

20

0

0

21

0

0

22

0

0

23

0

0

24

0

0

25

0

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Total Counts = 188 Total Counts = 564[1]

 

 

 

Scatterplots

Starting Grid of Drivers Finishing First

Starting Grid of Drivers Finishing Second

 

Starting Grid of Drivers Finishing Third

 

Starting Grid of Drivers Making Podium

 

 

Explanations

Drivers Finishing First

The most noticeable detail about the “Drivers Placing First” data table and scatterplot is that there is a dramatic negative association between starting grid and proportion of drivers placing first in the race. Drivers starting in pole position have placed first in their individual races 42% of the time, while drivers starting in the fourth position, just 3 grids back, have finished first only 4.3% of the time. By the time a driver starts in the eighth grid, only 1.1% have ever finished first in their race.

The relationship does not appear to be linear, which agrees with my prediction stated previously. If there was a negative linear relationship, for every increase in starting position the proportion of drivers winning the race would decrease the same value, which is not true. For the first three grids, there seems to be a steady decrease in the proportion of drivers winning the race. However, past these three grids there is very little change in the proportion of drivers winning because barely anyone beyond these grids has placed first in their race. In fact, only in 28 of the 188 archived Formula 1 races between the years of 1994 and 2005 have had a driver starting behind the third grid and finishing first in the race.

There are four outliers in this set of data: two drivers who won when starting in the 14th grid, one driver who won when starting in the 16th grid, and another driver who won when starting in the 18th grid. These drivers were able to pass 13, 15, and even 17 drivers in order to finish first in their respective races, an unbelievable feat in Formula 1 racing.

Drivers Finishing Second

            For drivers placing second, there also appears to be a non-linear negative association, however it is not nearly as dramatic as the previous relationship. One apparent detail in this data table and scatterplot is there is a higher proportion of drivers starting in the second and third grid coming in second place than drivers starting in the first grid. This can be explained because the drivers starting in pole position are too busy coming in first place to be finishing in second place.

            As expected, it appears that drivers starting in later positions have more of a chance coming in second place as opposed to first. Drivers starting in the sixth grid have finished in second place 6.9% of the time while they have finished in first only 1.6% of the time. Because of this, second place seems like a much more obtainable goal if a driver has a starting position more towards the middle of the pack. However, no one who has started beyond the fourteenth grid has ever finished second in their race.

Drivers Finishing Third

            Like the previous two sets of data, there appears to be a non-linear negative association between the two variables. However, there are a few very significant points that must be discussed. For drivers starting in the first and second grid, it is less likely that they will finish third as opposed to drivers beginning in the third and fourth grid. This can be explained by my previous theory, that the drivers starting in the first and second position are too busy finishing first and second in the race than to finish third. Because of this, our scatterplot has almost a mound shaped curve that is skewed to the right.

            If I were to discard these two points, the scatterplot may suggest a linear relationship. However, these two points have a lot of pull and therefore a linear association will not be considered.

            This data also shows that drivers starting in the middle of the pack have a greater chance of finishing third over second or first. Still, there has never been anyone that has started beyond the fifteenth grid and finished the race in third place.

Drivers Making Podium

            This has a similar relationship to the drivers finishing first except the decrease in proportion of drivers making podium is not nearly as dramatic. A driver starting in pole position has a 66% chance of making podium, while drivers starting in second and third position have a 56.4% and a 50.5% chance of making podium, respectively. Like the other two scatterplots, the probability of placing in a race becomes 0% around the 18th grid.

            There does not appear to be any points truly out of place. The slope of the data follows a constant curve, which suggests that either an exponential or quadratic model may be most appropriate.


Finding Appropriate Models  

            In this section a variety of models will be studied in order to find the most appropriate model for the data of drivers either finishing first, second, or third, or making podium in a Formula 1 race.

Drivers Finishing First

         It is evident that grids 1 through 3 have probabilities that are all much higher than the probability of winning out of any other grid. The curvature of the scatterplot suggests that perhaps exponential, quadratic or quartic regression functions would be possible candidates for a model.

            Once the grid number passes the 18th position, there are no drivers who have ever finished first in a race. Because of the large amount of pull that these points would have on the model, they will be ignored for the time being. Also, because exponential regression is not possible for data points that lie on the x-axis, we must change the points that do lie on the x-axis to numbers that are extremely small (close to zero). Therefore, points (9, 0), (12, 0), (13, 0), (15, 0) and (17, 0) will be changed to (9, .0001), (12, .0001), (13, .0001), (15, .0001) and (17, .0001). These modifications are too subtle for there to be a significant impact on the equation for each tested model.

Exponential Regression

Formula

r-Squared

Fit

Comments

 

 

y = a * bx;

 

y = .197 * .681x

 

.527

This model fails to explain data for grid 1, 2, and 3, which are the most significant points.

            Although exponential regression for this set of data has a decent r-squared value, the model fails to include the data points where grid number equals 1, 2, and 3. The model is not steep enough to correctly explain these three points, which happen to be the three most significant points on the plot.

Quadratic Regression

Formula

r-Squared

Fit

Comments

 

y = ax2 + bx + c;

 

y = .0028x2 - .0669x +.368

 

 

.801

This is a pretty good model. It shows the steep slope quite well, however it goes below y = 0 at a few points.

           

            The quadratic regression model explains the slope of the data for the lower starting positions very well. However, the model seems to lose its precision by grid 9, where it goes below the x-axis and then begins to increase well above the x-axis at the 16th grid.

Quartic Regression

Formula

r-Squared

Fit

Comments

y = ax4 + bx3 + cx2 + dx + e;

 

y = .000044x4 - .0021x3 + .0356x2 -.250x + .6382

 

 

.990

This model is just about flawless. It explains the slope perfectly and asymptotes at

y = 0.

 

This model explains the slope almost perfectly and also explains the points just above the x-axis. It loses its precision past the 18th grid, but we have previously said we will consider every grid past the 18th grid to have a probability of approximately 0%.

Conclusion

Model

r-Squared

Sum of Squared Residuals

Residual Plot

Pattern?

Exponential

.527

.124

No

Quadratic

.801

.044

Yes

Quartic

.990

.0023

Maybe

 

Both the quadratic and quartic models have very high r-squared values and very small sum of square residuals values. However, I think that the quartic model is the best candidate. Not only is its r-squared value practically equal to 1, its sum of square residuals is about 20 times less than the sum of square residuals for the quadratic model. Also, the residual plot for the quadratic model has a parabolic pattern whereas the residual plot for the quartic model may or may not have a pattern. Therefore, we can say that the relationship between drivers starting position and the probability that driver will place first can be most appropriately defined as:

For (1£grid£18), probability of finishing first = .000044grid4 - .0021 grid 3 + .0356 grid 2 -.250 grid + .6382;

For (18<grid£25), probability of finishing first = 0;

Drivers Finishing Second                                        

            This set of data also expresses a non-linear negative association. Because of its constant curve, I predict quadratic and quartic regression will be most appropriate.

            Grids 15 through 25 all express a 0% probability that a driver starting will finish second in the race. However, because a driver starting in the 18th grade has won a race, only grids 19 through 25 will be excluded for the time being, and will later be explained by the probability = 0 model.

Quadratic Regression

Formula

r-Squared

Fit

Comments

 

y = ax2 + bx + c;

 

y = .00026x2 - .0443x +.255

 

 

.928

Very good fit. It does not show that starting in grid 1 has less probability than grid 2.

 

Although this model fails to show that starting in grid 1 and finishing second has a smaller probability than starting in grid 2 and finishing second, I think overall the model is very suitable. It has a high r-squared value and it fits the curve of the data very well.

Quartic Regression

Formula

r-Squared

Fit

Comments

y = ax4 + bx3 + cx2 + dx + e;

 

y = -.0000144y4 + .000496y3 - .0042y2 -.01370y + .219

 

 

.939

Very similar to the quadratic model with a slightly higher r-squared value.

 

            This model is very similar to the quadratic regression model, with only a slightly higher r-squared value.

Conclusion

Model

r-Squared

Sum of Squared Residuals

Residual Plot

Pattern?

Quadratic

.928

.0064

No

Quartic

.939

.0050

No

 

            Both of these models seem very fitting for this set of data. However, because the quartic regression model has a slightly higher r-squared value and a slightly smaller sum of squared residuals, I will say that the appropriate model is the quartic regression model. Therefore, we can say the relationship between drivers starting position and probability that driver will finish in second place can be defined as:

For (1£grid£18), probability of finishing second = -.0000144grid4 + .000496grid3 - .0042grid2 -.01370grid + .219;

For (18<grid£25), probability of finishing second = 0;

Drivers Finishing Third

            This relationship projects a curve that first increases to grid 3, then begins to decrease to grid 12, then stays relatively constant at probability = 0 until grid 18. This suggests either a cubic or quartic model would be the most appropriate.

            Grids 16 through 25 all express a 0% probability that a driver starting will finish second in the race. However, because a driver starting in the 18th grade has won a race, grids 19 through 25 will be explained by the probability = 0 model.

Cubic Regression

Formula

r-Squared

Fit

Comments

y = ax3 + bx2 + cx + d;

 

y = .000187x3 - .005x2 + .0332x -.0614

 

 

.845

Decent fit, but it does not quite explain points for grid 2 and 3. Model goes below the x-axis as well.

 

            This model shows a decent fit for this set of data. It increases from grids 1 to grids 4, then decreases to grid 15, and then increases to grid 18. However, it does not go high enough to explain grids 3 and 4, and it goes below the x-axis around the 13th grid.

Quartic Regression

Formula

r-Squared

Fit

Comments

y = ax4 + bx3 + cx2 + dx + e;

 

y = .000037x4 + .0016x3 - .023x2 + .112x - .034

 

 

.956

Better fit that does explain points for grid 2 and 3. Does not go below the x-axis until past grid 18.

 

This model goes high enough to explain grids 3 and 4, and stays constant at 0% probability for grids 12 through 18. It is almost perfect for the given set of data.

Conclusion

Model

r-Squared

Sum of Squared Residuals

Residual Plot

Pattern?

Cubic

.845

.0078

No

Quartic

.956

.0022

No

 

            The best model for this set of data is the quartic regression model. It has an r-squared value of almost 1, and its sum of squared residuals is almost 0. It also fits the curve of the plot with utmost precision. Therefore, we can say the relationship between drivers starting position and probability that driver will finish in third place can be defined as:

For (1£grid£18), probability of finishing third = .000037grid4 + .0016grid3 - .023grid2 + .112grid - .034;

For (18<grid£25), probability of finishing third = 0;

Drivers Making Podium

            This set of data seems to have a constant curve, starting with the highest probability at grid 1 and then asymptoting above the x-axis at grid 11. I predict that either an exponential or quadratic model would be the best candidates for a model.

            Once again, data points from grids 19 to 25 have been deleted and will be defined as probability = 0 later on.

Exponential Regression

Formula

r-Squared

Fit

Comments

 

y = a * bx;

 

y = 2.056 * .6694x

 

.797

This model does not explain grids 1 and 2 well.

 

            The exponential model fits well for the later grids; however it misrepresents the first and second grids by a large margin. For example, the predicted value for grid 1 according to this model would be 1.38, equal to a 138% chance that a driver starting in pole position will make podium in the race. This does not make exponential regression a practical model.

Quadratic Regression

Formula

r-Squared

Fit

Comments

 

y = ax2 + bx + c;

 

y = .0036x2 - .103x +.724

 

 

.963

Very high r-squared value and fits the curve of the data very well.

 

            Unlike the exponential model, the quadratic model explains the first two grids very accurately. The curve of this model fits with the curve of the data almost perfectly.

Conclusion

Model

r-Squared

Sum of Squared Residuals

Residual Plot

Pattern?

Exponential

.797

.668

No

Quadratic

.963

.0289

Maybe

 

            Because the exponential model misrepresents the probabilities for the first 2 grids, I feel it is not a good fit for the data. In contrast, the quadratic regression model fits the curve of the graph very well, and is very appropriate for the given set of data. Therefore, we can say the relationship between drivers starting position and probability that driver will finish in third place can be defined as:

For (1£grid£18), probability of making podium = .0036grid2 - .103grid +.724;

For (18<grid£25), probability of making podium = 0;

 

Conclusion

            The four following models are the most appropriate for the four sets of data:

For Drivers Finishing First

For Drivers Finishing Second

For (1£ grid £18), probability of finishing first = .000044grid4 - .0021 grid 3 + .0356 grid 2 -.250 grid + .6382;

For (18< grid £25), probability of finishing first = 0;

For (1£ grid £18), probability of finishing second = -.0000144grid4 + .000496grid3 - .0042grid2 -.01370grid + .219;

For (18< grid £25), probability of finishing second = 0;

 

For Drivers Finishing Third

For Drivers Making Podium

For (1£ grid £18), probability of finishing third = .000037grid4 + .0016grid3 - .023grid2 + .112grid - .034;

For (18< grid £25), probability of finishing third = 0;

For (1£ grid £18), probability of making podium = .0036grid2 - .103grid +.724;

For (18< grid £25), probability of making podium = 0;

 

           
Discussion

Weaknesses

            There are a couple of weaknesses in this study that must be taken into consideration. To begin with, there were some unavoidable problems during sampling. The year 1994 was chosen as a starting point for the project because it was said that if we were to go further back then there might be significant changes in the data due to older technology, etc. However, there is no point in time that is precisely too far back according to changes in technology. Formula 1 racing technology improves considerably every year, with new developments in engine design, traction, and aerodynamics. Yet there still needed to be a point in this study where I had to begin taking data, and 1994 seemed like the most logical year because of the revisions done to Formula 1 regulations at this time.

            In addition, random racing conditions can also affect the precision of the four determined models. Uncontrollable incidents such as the weather, track conditions, or accidents can have an affect on the ability or inability for a driver to pass other cars. This was not taken into account when I took my sample, because randomness is a major aspect of Formula 1 racing and these events must be included with the rest of the data.

            A more subtle weakness in this study is the alteration of data in order to find appropriate models. There has never been a driver that has started in grids 19 through 25 that has finished in one of the top three positions. Therefore, these grids were excluded from any forms of regression so they would not any influence, and were then defined with the model probability = 0. Although there has never been someone to place in the top three after starting in those positions, there is still a minute chance that a driver is able to do this.

            Lastly, the driver’s skill level and automobile must be taken into consideration. If a driver is able to finish the qualifying rounds with a top qualifying time, he is most likely one of the top drivers or most likely has one of the best automobiles, or both. Because driver skill and construction of automobile has so much to do with Formula 1 racing, if a driver is in one of the top three grids it is probably for a reason. This is a likely explanation to why so many first, second, and third place finishes go to drivers who are seated in the top three grids. It is very difficult to pass, especially when the driver you are trying to pass is one of the best in the circuit. There are some obvious exceptions to this rule, like if a great driver happens to have a bad qualifying time and is seated in one of the later grids, only to have enough skill to finish in the top three in the race. However, for the most part starting position of a driver is proportional to the skill of the driver and the construction of his automobile.

Interpolation

            For this section, the four models determined will be used to predict the probability that a driver will either finish first, second, third, or make podium given that the driver starts in the 5th grid. This starting position was chosen because it is not too far back, and it is possible that a driver starting in this position will finish in the top three in the race.

probability of finishing first = .000044grid4 - .0021 grid 3 + .0356 grid 2 -.250 grid + .6382 = .000044(5)4 - .0021 (5) 3 + .0356 (5) 2 -.250 (5) + .6382 = .0432

 

probability of finishing second = -.0000144grid4 + .000496grid3 - .0042grid2 -.01370grid + .219 = -.0000144(5)4 + .000496(5)3 - .0042(5)2 -.01370(5) + .219 = .0975

 

probability of finishing third = .000037grid4 + .0016grid3 - .023grid2 + .112grid - .034 = .000037(5)4 + .0016(5)3 - .023(5)2 + .112(5) - .034 = .1741

 

            probability of making podium = .0036grid2 - .103grid +.724 = .0036(5)2 - .103(5) +.724 = .2990

            Here is how these predicted values compare to the true probabilities:

True Probabilities

Predicted Probabilities

Percent Difference

.032

.0432

25.9%

.059

.0975

39.5%

.128

.1741

26.5%

.218

.2990

27.1%

 



[1] Note – grid tallies are divided by 188 in order to get proportion, not 564