# Thread: Need Help figuring out a statistics problem

1. ## Need Help figuring out a statistics problem

I'm not sure that I am doing this right. I am trying to figure out the odds of being hit X time in a row during a fight where a boss swings Y times.

Figuring out the probability of X times in a row is easy. If you want to calculate out the chance of getting hit 5 times in a row and your avoidance is 38.32% total, then the chance of getting hit 5 times in a row when swung at 5 times is 8.93%. But what are the odds of getting hit 5 times IN A ROW when swung at 100 times and could someone please provide the formula they used to figure it out. Right now I'm getting an answer of about a 99.9874% chance that you will be hit 5 times in a row on a fight that a boss swings at you 100 times.

Can this be right?

Current formula for the probability is (1-(1-%avoid)^x))^(y-(x-1)) that you won't get hit x times in a row on a fight y hits long. So then 1 minus all of that is probability you will get hit in a row x number of times in a fight y hits long.

It feels like I'm doing something wrong.

2. Established Registrant
Join Date
Sep 2009
Posts
721
No, that number isn't right. I think the second exponent in your equation needs to be a multiplier. But it's not an easy problem. I would phrase it this way: the chances of not being hit 5 or more times in a row ( because it's not just 5 times) is equivalent to 1-(all chances of being hit four times) - (all chances of being hit 3 times) - (all chances being hit 2 times) - (all chances being hit one time) - (all chances being hit no times). And then figure out the individual chances for each of those.

Here's one explanation that makes some sense to me:
The probability of "n" consecutive heads is .5 ^ n. So for 10 consecutive flips, the probability is .5 ^ 10 or 0.0009765625. The probability of this not happening is 1 - (.5 ^ 10) or 0.999023438.

The probability of not seeing 10 heads in a row can be expressed as (0.999023438) ^ #attempts. Thus at toss 710, it becomes more likely than not that you will have seen at least one run of 10 heads -- (0.999023438 ^ 710 = 0.499724591). If you toss the coin 5,000 times you will see at least one run of ten heads 99.3% of the time. If you toss the coin 10,000 times you will see at least one run of ten heads 99.9942882% of the time.
So for your case, let's assume 50% avoidance. The probability of (5) consecutive hits is .5^5, or .03125 - meaning that the chance of this not happening is .96875. The chance that you will get a run of 5 hits in 100 runs is .0417, or 4%. But I think that's not quite right.

There's more details and a spreadsheet here:

3. that doesn't make since because as you increase the last exponent the result goes down, not up, so wouldn't that be the probability that you won't get hit 5 times a row? ALL THESE NEGATIVE STATEMENTS ARE CONFUSING ME.

Seriously though, once I get this problem figured out and finish up a bunch of writing I'll be done with a massive effort that wasted a perfectly good Friday and most of Tuesday and Wednesday.

So ya... I leave this last bit of math up to the tankspot community!

Where's Kojiyama, he usually loves this sort of thing.

4. I would tackle the problem differently.

I'd consider every blow struck to have a chance to start a 5-string of hits, and work from there. (Since we are working with 100 blows here, you can disregard blow 97-100, of course.) Unless I'm mistaking here, that chance is the same as the chance you calculated - having five hits in a row when only taking five hits.
So every blow gets a 8.93% chance of proccing that 5swing string, and there's 96 blows to consider.

While there's a formula to calculate the success chance out there somewhere, calculating the failure chance is easier, and gives you the same results. If memory serves, the formula here is (chance of failure)^(number of tries), or (0.917)^96. That gives us a grand total of 0.00024, or 0.024%.
So, the chance that there will be a five-string of hits in 100 blows is 99.976%.

Note: I'm tired and it's been a few years since I've done these things. If someone can check if I'm right I'd appreciate it. I'm fairly sure, and cannot find a flaw in my math, but still.
Last edited by Martie; 04-30-2010 at 11:51 PM.

5. the spreadsheet I have setup actually does calculate all of that,a nd actually if you notice the Y-(X-1) in the original equation I posted that means that I'm considering (for the case of 100 hits and 5 hits in a row) 96 hit strings.

I also do calculate it both ways and the numbers add up.

It just seems like a REALLY high probability. If I am correct (which based on these two posts it seems like I am) then it confirms all of my hypotheses actually better than I thought they would have.

6. Why does it seem like such a high probability? It's exactly what I expected after reading your initial numbers.

I mean, there's a relatively large chance it'll happen, and a reasonably large amount of tries.
(It's kinda like rolling a d12 a hundred times and asking yourself what the chance of getting at least one 12 is.)

7. Established Registrant
Join Date
Sep 2009
Posts
721
It shouldn't be that high. Think about all the ways that you can choose rolls such that you don't get 5 hits in a row out of 100; it's quite a large number.

I mean, there's a relatively large chance it'll happen, and a reasonably large amount of tries.
(It's kinda like rolling a d12 a hundred times and asking yourself what the chance of getting at least one 12 is.)
It's similar to that, but it's harder to calculate. The chance of not rolling one 12 is 11/12 ^ 100 which is pretty small - .998. But that's still larger than the above number, and that doesn't make sense at all.

8. Originally Posted by felhoof
It's similar to that, but it's harder to calculate. The chance of not rolling one 12 is 11/12 ^ 100 which is pretty small - .998. But that's still larger than the above number, and that doesn't make sense at all.
Actually, it's not harder to calculate.
There's 96 possible strings of five consecutive hits in 100 blows. (So you change the number above from 100 to 96.) The chance of all five of those hitting is 8.93% (so you change the 1-1/12 to 1-0.0893). The formula stays the same.
Note that we are calculating the chance of it happening at least once, not exactly once. The math for calculating the chance of it happening exactly once is decidedly more complex.
(Note: The possibilities of five consecutive swings not being all hits only has 31 options.)
Last edited by Martie; 05-01-2010 at 12:23 PM.

9. Hey Martie, want to PM me your e-mail and I'll just send you this spreadsheet? From what we're talking about it looks like I'm calculating it correctly, I just want to be sure.

10. has a life
Join Date
Apr 2010
Posts
41
I thought about it for a while and I doubt there's an easy way to make up a probability formula for that one. All the approaches described above have serious caveats. I guess you should be looking at Monte Carlo method to get reliable numbers instead. In short the algorithm for the problem should be something like this:

1) Write 50000 universally distributed random numbers in range from 0.01-1 into an array.

2) Parse through the array, comparing the numbers with your avoidance threshold, incrementing the counter, which would count the number of successive hits. This counter will increment on every succesfull hit and reset to 0 on a miss. Let's call the counter n for simplicity.

3) This part depends on what you're trying to do. As far as I understand, you want to find out how much times you'll get hit X or more times in a row in Y total attacks. Then whenever n goes to X, you increment the actual occurence counter m, skip all successive hits after that (if there are any), and reset n to 0. Then go on with parsing.

4) Now, to find your probability you just have to divide m by (50000/Y).

11. I just want to find out if you take X or more hits in a row at LEAST once, not how many times, so I'm pretty sure I'm doing it right.

12. has a life
Join Date
Apr 2010
Posts
41
Originally Posted by Aggathon
I just want to find out if you take X or more hits in a row at LEAST once, not how many times, so I'm pretty sure I'm doing it right.
Then you can go one more step and calculate this probability, using Poisson distribution. It would be: (1 - (m^0) * (e^(-m)) / 0!) = (1 - e^(-m))

Which is equivalent to the chance that you will get X or more hits in a row 1 or more times over Y total hits. X and Y do not appear in the probability formular, because it is part of m (which is how many times are you going to be hit X or more times in a row on average in Y total hits). Zero in the formular stands for never getting hit X times in a row, so 1 minus that probability gives you the chance of getting X or more hits in a row at least once.

I'm not sure how you are calculating the probability now, but the suggestions above weren't entirely accurate. felhoof's idea didn't take into account that this X hit sequences can start at other time than every X hits, since you can get, say, three misses, X hits and a miss again, which it doesn't account for. While Martie's ignored the fact that those 96 sequences (or Z sequences of X in Y with step 1) aren't independent. Their probabilities overlap, since they consist of the same events, i.e. nth sequence has x-1 common hits with n+1 sequence, so they are dependent.

I'm not really saying that my idea is the only way to go, but I'm pretty sure that it is statistically accurate, so at least you might want to check a couple of your calculations using this approach... otherwise, I won't really believe in those.

13. No, I'm pretty sure martie is right, because using 96 tries out of 100 does mean that you are calculating the overlap properly. If you were to say 100/5 = 20 possibilities, then you wouldn't be calculating the probability accurately.

I'll look into the Poisson distribution but it is more often used for time to failure things where you ARE looking for a specific number of times it happens (i.e. only once, or only twice or only 3 times). I don't really care if it happens only once or only twice, just that it happens at least once, and with that you typically use a binomial distribution.

14. has a life
Join Date
Apr 2010
Posts
41
Originally Posted by Aggathon
No, I'm pretty sure martie is right, because using 96 tries out of 100 does mean that you are calculating the overlap properly. If you were to say 100/5 = 20 possibilities, then you wouldn't be calculating the probability accurately.
Martie's model does not correspond to what you're trying to find here. I think it will be easier to make a simple counter example to show that the model isn't accurate. Suppose you get X+2 hits in a row. In the real life model you get 1 time of X or more hits. In Martie's model it will count as 3 times of X or more hits, since it will treat all three sequencies of length X as separate independent events. Which is not correct.
The model you're looking for is too complex to be represented by a simple equation, really. It's more like spinning roulette and looking for he probability of red coming up a certain number of times, and that's a task for Monte Carlo's method. Actually, it's how the method got its name. You can trust me on that, though I'm no math geek, I took a stats course back then. Getting you to misinterpret dependent events for independent ones is the most popular way to crew you up on the exam. And my prof was particularly nasty on this one.

Originally Posted by Aggathon
I'll look into the Poisson distribution but it is more often used for time to failure things where you ARE looking for a specific number of times it happens (i.e. only once, or only twice or only 3 times). I don't really care if it happens only once or only twice, just that it happens at least once, and with that you typically use a binomial distribution.
Poisson in a nutshell is about calculating the probability that you get X occurances of independent events over some interval Y, when you know how many events occur over Y on average. Which is exactly what the Monte Carlo's method algorithm, which I've sketched above, does. You are interested in the probability that the event (getting hit X or more times) will occur one or more times, which is the opposite of the event never occuring or P(0,m). So it's pretty easy to calculate with Poisson, substracting P(0,m) from 1.

15. I don't think you're understanding what I'm looking for.

Also, I am currently taking two stats course, an upper level engineering stats course and a low level business stats course, so I do have a grasp on what is going on here, and I'm pretty sure what I am wanting to find is plain and simple a binomial distribution. I do not care about hit sequences more than X+1 because if you get hit X times, you die, fight over.

16. has a life
Join Date
Apr 2010
Posts
41
Well, it certainly isn't simple binomial distribution. It's like saying there's 50/50% chance to see a live dinasaur in a zoo, because you either see it or you don't.

Mind you, you aren't just looking just at hit vs miss chance, you're looking at a chance to hit X times in a row, which is much more complex. I think you're getting it confused with actual hit vs miss mechanic, which indeed obeys binomial distribution. But the event of getting hit X times in a row out of Y total swings is not by any account binary, since you can get hit X times in a row zero times or once or twice or three times, etc. If you stop counting after the first time, your chance calculation is just get plain wrong, since you miss out a huge chunk of the actual probability.

Say, your X is 5 and Y is 200. If the probabiliy to get hit 5 times in a row in just 20 swings is P(20), you can't just day that you'll get hit 5 times in a row in 200 swings with the probability of P(20), since P(200) will be much bigger. Neither will multiplication of P(5) by 10 help you, since the chance of getting hit 5 times in a row for any given set of 5 swings is not independent, thus, simple multiplication won't work.

Since you're taking stats courses, I think you should ask your prof whether your model is accurate or not. I guess you would trust them more than me.

17. Originally Posted by Winterburn
It's more like spinning roulette and looking for he probability of red coming up a certain number of times, and that's a task for Monte Carlo's method.
You are right - checking the chance of red coming up a certain number of times is something for a monte carlo system. It's also not what we are doing.
Using the roulette example, what we are doing is calculating the chance of red coming twice in a row once or more using a fixed set of tries. This is rather easy to do - since we are interested in two reds in a row, we calculate the number of possible sets of two spins that are next to eachother. Then we calculate the chance of having two reds in a row in two spins. Now the system is rather simple - you have your chance per attempt, and your number of attempts.

18. Originally Posted by Winterburn
Well, it certainly isn't simple binomial distribution. It's like saying there's 50/50% chance to see a live dinasaur in a zoo, because you either see it or you don't.
I'm sorry, but that shows a gross misunderstanding of what a binomial distribution is. Just because the odds aren't 50/50 doesn't mean it isn't binomial, binomial means either it happens or it doesn't. The simplest example was already given, that of a coin being flipped. You can have a biased coin that isn't 50/50, it could be weighted and therefore give the probability of 75% heads 25% tails, but it is still a binomial distribution.

And actually I did just e-mail my professor.

19. Established Registrant
Join Date
Sep 2009
Posts
721
The problem with Martie's decision is that there exist a set of probabilities that contain more than 5 tries in a row (or X tries in a row) that would be counted more by the way he does things. The way to approach the problem in his case is via some kind of simulation; since you're trying to effectively say:
if (1 head) and 2 heads and 3 heads and 4 heads and 5 heads then fail
else if (2 head) and 3 head and 4 head and 5 head and 6 head then fail...

etc. Since this is dependent (the if 1 head part) it's not a simple binomial distribution. Binomial works for independent events that you only care about the count of items; dependent events don't fall in an easy binomial count.

You might try this as well, since it's basically the same problem:
http://mathforum.org/library/drmath/view/56637.html

20. has a life
Join Date
Apr 2010
Posts
41
Originally Posted by Aggathon
Originally Posted by Winterburn
Well, it certainly isn't simple binomial distribution. It's like saying there's 50/50% chance to see a live dinasaur in a zoo, because you either see it or you don't.
I'm sorry, but that shows a gross misunderstanding of what a binomial distribution is. Just because the odds aren't 50/50 doesn't mean it isn't binomial, binomial means either it happens or it doesn't. The simplest example was already given, that of a coin being flipped. You can have a biased coin that isn't 50/50, it could be weighted and therefore give the probability of 75% heads 25% tails, but it is still a binomial distribution.

And actually I did just e-mail my professor.
Yeah, I really see no point in arguing if you only read the first line of my post. I explained why what you're looking for isn't binomial distribution in full detail just below the quoted passage (I made the key phrase in bold just in case):
Originally Posted by Winterburn
Mind you, you aren't just looking just at hit vs miss chance, you're looking at a chance to hit X times in a row, which is much more complex. I think you're getting it confused with actual hit vs miss mechanic, which indeed obeys binomial distribution. But the event of getting hit X times in a row out of Y total swings is not by any account binary, since you can get hit X times in a row either zero times or once or twice or thrice, etc. If you stop counting after the first time, your chance calculation is just get plain wrong, since you miss out a huge chunk of the actual probability.

Say, your X is 5 and Y is 200. If the probabiliy to get hit 5 times in a row in just 20 swings is P(20), you can't just day that you'll get hit 5 times in a row in 200 swings with the probability of P(20), since P(200) will be much bigger. Neither will multiplication of P(5) by 10 help you, since the chance of getting hit 5 times in a row for any given set of 5 swings is not independent, thus, simple multiplication won't work.

Originally Posted by Martie
Using the roulette example, what we are doing is calculating the chance of red coming twice in a row once or more using a fixed set of tries. This is rather easy to do - since we are interested in two reds in a row, we calculate the number of possible sets of two spins that are next to eachother. Then we calculate the chance of having two reds in a row in two spins. Now the system is rather simple - you have your chance per attempt, and your number of attempts.
Ok, here's a set of equal probabilities of miss, let's call them P1, P2, P3, P4, and P5. What you propose is taking all pairs (or sets of X), calculating the probability within the set and multiplying it by the total number of sets, i.e.: (P1*P2)*(P2*P3)*(P3*P4)*(P4*P5).

That is basically your (chance of failure)^(number of tries) or (chance of miss squared)^(4), since each pair represents a try and each formula in brackets is the chance of two consecutive misses (they are all equal, obviously). This is not correct, as I've stated numerous times before, because you multiply dependent events. Two pairs will have the same event as part of their probability. For instance, event 2 is in probabilty of two pairs (P1*P2) and (P2*P3). And multiplication of probabilities is not a valid way to find cummulative probability of dependent events.