The Weekly Marmot - The Usefulness of Sims & Parses
I think what you are fighting is the thought someone could use this information improperly to make something a truism of sorts.(from abusing what numbers are crunched out from a machine or even recount for that matter)
This happened during Cata when Survival became the " Top " dps - providing you were using a four piece set with a certain bonus and thus .. people would look at a hunter and tell them to go Survival to increase their dps - when the player themselves never played the spec. The way the stats work, that everything would have to be reforged and a certain amount of ungodly haste be reached----- to match the amount of dps the four piece bonus gave (due to the haste plateau allowing a certain amount of arcane shots to be slip streamed into the rotation.)
(People would ask how to reforge and well - you only have enough stats on the gear in the game that you can stack this main stat or another one and then adding anything else is going to result in some really low change in % of that stat ... )
Ive talked to people partial to survival still--- even though beast master being the higher projected dps ( and its not by much ) but its because they like and play that spec comfortably.
I've seen people pull down "really good numbers" when you compare them to people pulling down "really bad numbers".
I know ive seen horrible players pull down good numbers using one button rotation macros - when i looked at their recount they hadn't used aimed shot as a marksman hunter this is unheared of but id seen the macro and you had to hold down alt to fire off the aimed shot and they never used it ... but still pulled down a significant amount of dps for the raid.
When I think about how much better of a player they could be compared to what they do because of laziness its just beyond me.
Like it or not - in spite of things like these - there are just some specs that will out preform other specs with very little effort. This could just be how the stats are layed out on the gear or how people are able to stack them.. We really have not gotten away from some sort of cookie cutter spec where someone can get away with just choosing that class and being accepted into raids and get enough gear to be somewhat good enough .. and yet still do not understand their class.
Thus people like this and because of the way the game is - could pull huge dps and never understand how to use any of their classes utility, not enough to interrupt a spell cast from a level 5 murloc.
I really don't like it when someone tells me these things or tries to lay it out in front of me in some strange scientific model because i will more than likely reject it and say - well they just don't know how to play the class .. or the game for that matter. (if they cant pull down good numbers in another spec) In other words I dont believe that the damage output of any class can make up for stupidity. You could ask a top dps in the raid how they hit their numbers and to explain the mechanics of any given raid fight .. and they just wouldnt be able to do it. I can simply look at the website and gear based on what the people in the top end are doing .. which is really funny when you look at what happened in arena and if people geared according to the teams just stealing wins for rating...
There will just be these kinds of trends in the game and I try as best as possible to ignore them for as long as I can - or make my own decisions based on encounters and my own computations of the math in my head. Mainly by understanding these things will change as they patch in 5.1 and it did last expansion... (and what i can pull down in my own gear)
I really do hate when someone thinks they have hatched a golden egg based on glancing at a few numbers and not putting much thought into things outside of that.
Last edited by hailmary; 11-22-2012 at 02:37 AM.
Just a small bit of input from a raiding mage PoV..
First of all Fire has no AoE, nothing good at least, we have a cleave that spreads our dots to 2 additional targets, and as our dots are major compenent of our dps it ends up increasing fire mage dps a lot..
And the difference in parces is partly because every top geared mage is fire now, but that doesnt remove the fact that frost is far behind fire even in theoretical single target dps atm, that is if the firemage gets his dots rolling, and well arcane is just a mess atm, cause the tier 6 talents have 0 synergy with arcane spec, if there is any movement on a fight as an arcane, you can just wave goodbye to your dps.
Anyways on the actual weekly marmot, yes you are spot on with the problems parces face atm, but you can and will still rank without cheesing if you can execute your rotation perfectly, make no mistakes and got to gear to pull it off.
Simulationcraft can actually model a lot more than just Patchwerk fights performed by perfectly behaving robots; it can model periods of movement, stuns, boss vulnerability/invulnerability, and even periods of "distraction" where players' "skill" is artificially lowered. (Player "skill" is modeled as a chance to simply skip a line of the priority list when evaluating it.) Of course, this is all mostly irrelevant once you get out of the Simulationcraft community and into the forums, since everyone just points at the T14H Patchwerk sims and complains from there.
Another big pitfall with Simulationcraft: not every class uses it to the same extent. Warlocks, for instance, have a pretty active community of people playing with the profiles and finding the best gear sets and priority lists they can; other classes' communities may be more partial to a spreadsheet or other simulation tool, and may not spend as much time on making optimal Simulationcraft profiles. They may even have mechanics which aren't fully modeled in the engine.
Seem to be a few off-remarks, most that Marco covered - simcraft can simulate more than Patchwerk fights, simulating heavy movement fights etc - which while obviously not perfect, it won't simulate the exact amount of time spent running around on encounter x, but it can give a decent impression to be taken with a bit of salt.
Likewise, I believe simcraft currently has 4 difficulty levels, one "perfect" level, 2 mid levels and a "what is fire", which again - won't perfectly simulate what you mess up on, but it's simulating bad-robot play as opposed to perfect-robot play.
Last, I'm not sure what you use that presumes you're in BIS gear - simcraft at least allows you to import your gear, and is borderline useless to you unless to you do that.
That aside, I completely agree with the sentiment that you really need to dig into parses and sims to get anything out of them - you need to know how the player you looked at was geared, what they were dpsing, how their raid composition was made up, the duration of the fight etc.
Looking at sims and finding a warlock that did ridiculous damage on shannox - then finding that he was multidotting the dog adds when they weren't being killed, parses for that boss become useless. Likewise I remember ranking in the top 20iirc on majordomo back in firelands, a huge factor in that was I lucked out and got zero leaps in the cat phase.
It really takes some digging to get good information out of them.
I was referring to the parses posted on simulationcraft.org, not the tool itself.
That said, I'd still debate the usefulness of simulations beyond optimizing rotations & gearing.
Long time viewer and first time poster here. Just wanted to touch on the Raidbots portion of your...marmot? That sounds wrong. Moving on...
When you mentioned that viewing all overall parses, I noticed you didn't (probably due to show length) mention percentiles. Namely, when you go there and set the Data Set to All Parses, select a specific Boss on the bottom-right and then change the Measure to a specific percentile.
Statistically that allows the user to remove extraneous "I got lucky" and "I'm a baddie" parses. Lucky, super optimal parse end up in the top and Baddie McBaddersons stay in the bottom. So when looking for numbers that are solid goals, 70th percentile, for example, would be a fair way to gauge where you should be hitting, no?
My logic for that would be:
- 70th percentile is a low enough percentile to account for gear variance. Some people get super lucky on drops/roll, others don't. At this percentile it is fair to say that you have *about* an average amount of gear.
- 70th percentile is also high enough to say that you are well above average in terms of play. This means you know and play your class well and generally understand how to maximize your damage on said fight.
- 70th percentile is "in the middle" enough to account for strategy variance as well. Much like your example with Wind Lord, there are strategies out there that allow for overall better DPS. However, suffice it to say I'd venture to that there are generally much more widely accepted strategies implemented than not. This is obviously something that is unverifiable but I'd think it's a fair assumption to make in terms of trying to establish a DPS goal.
Anywho, I just wanted to get your (and others, of course) thoughts on the matter. Thanks for doing what you do, it really keeps me interested in the game knowing that intelligent discussion (mostly) makes for excellent web entertainment.
It removes the "I'm a baddie", but it doesn't remove the "I got lucky", lets say:
Originally Posted by ANTDrakko
70th percentile removes 8-10. What it does is (and men it is sad that I don't have a whiteboard to graph) include so many "low but normal" that reduce the effect of the "I got lucky" like:
But what are you trying to achieve with that? Which is the better spec? Per fight Per format? Because what you get is what people did with each spec during a certain period of time, that is something Lore explained, a more general answer I'm afraid is beyond of raidbots right now. But I think that with work it could generate those general answers.
Your examples are unrealistic and do not reflect what is actually happening. Raidbots provides a lot of data, so let's get some real examples:
Heroic Feng the Accursed. Let's pick Elemental Shaman for shits and giggles:
Standard deviation of around 13.6k - http://raidbots.com/dpsbot/Feng_the_.../14/60/stddev/ - (All Parses over 2 months)
929 samples (scroll down, You'll see the sample numbers)
99th - 105k
95th - 98k
90th - 95k
80th - 91.5k
70th - 88.5k (notice the gap getting much smaller)
60th - 85.9k
40th - 80.1k
30th - 78.2k
20th - 74.6k (notice the gap starting to get large again)
10th - 69.5k
The perception that the people up at the top are doing WILDLY more DPS or the people at the bottom are doing WILDLY less DPS just doesn't actually play out.
As to the question of "What am I using this for?" I would say that using the 70th percentile as a goal to work to for implementing a standard for your DPS. For example, in my guild, we promote initiates to raiders if they are showing that A) they are above 50th percentile or "Average" and that B) they are progressing toward the 70th percentile and not stagnating at said 50th percentile. Generally we are OK with people stagnating at around the 70th percentile because that is what we believe is crucial toward reaching our overall goal (in this case, seeing all heroic content on 25-man while relevant in 6 hours a week)
If I'm completely off-base please let me know with some in-line examples because, from what I'm seeing, using percentile-based general standards (i.e. start @ X percentile and OK to stagnate @ Y percentile) seems pretty fair so long as there are no absurd oddities like only 20 samples (lol @ 5 samples of Marksman Hunters on Heroic Feng).
The simple fact is that no one set of data can ever be used as a definitive point of balance. You have to look at everything, you have to understand what it is you're looking at, and you have to make logical conclusions that aren't "this number is higher than that number so BALANCE."
Players who expect everything in an MMORPG to line up perfectly numerically are going to spend a lot of time in meaningless forum debates that could instead have been spent enjoying the game.
and even if everything did balance it wouldn't be any funp; imagine poker where everyone has the exact same hand.
The guy with the best bluff would win (i.e. in that case, the more skilled player)
Originally Posted by Tengenstein
First, my example was wrong because I did 30th percentile.
Originally Posted by ANTDrakko
And yes, they were unrealistic, that was the idea. My point is only against the *general* idea that using percentiles will remove the "I got lucky", it helps and with more samples it helps more, but can't remove it. How much is a problem depends on the actual data, here for example you need to look at each of the specs you are going to evaluate, for the people that is talking about measure spec balance, they need to evaluate these top results for all the specs.
Well, that depends what do you think is too much, how big/small the standard deviation need to be for you to feel uncomfortable with your data.
Originally Posted by ANTDrakko
I'm skeptical but I don't think is worth for me to find an example. Maybe there is not one good enough this week to convince you, maybe will be next month, last week. It is not that simple. I only can say that this data is dynamic and you can't handle dynamic data with a static analysis.
Originally Posted by ANTDrakko
I think he meant where everyone has the exact same hand every time, which would mean bluffing is impossible
Originally Posted by Fetzie
Why would I (and everybody else) not "all-in" in every hand? That makes poker a perfect information game.
Originally Posted by Fetzie
And in wow the winner will be a combination of lucky and skill unless you remove the RNG from the damage in the game.
But in the end I think what Blizzard try to do with their balance is that players look at the Raid tier and say: we need some multidot for this fight, some cleave for that other, some strong execute for that other, sustained damage here, burst every X seconds there. And that should be interesting (if not fun) not only for the player, but for the team or the leader, trying to figure out what is best to handle the whole tier or one encounter. This view of balance isn't shared by most of the players that use simcraft page or raidbots to claim that X class is under performing or overpowered.
I'm going to pitch in here as we recently started logging our fights and I definitely see a discrepancy between simmed rankings and actual fight logs, both different from my personal view on what the players in question should be doing (as if that's objective, hah!).
First of all, SimCraft, as the most used example, isn't that accurate because any movement it sims is the same for all classes, while it's not. If I am running back from dropping a Wildfire on Feng and Lava Burst or Elemental Blast comes off cooldown, I'll stop. The Mages in my guild will keep running, because they don't have enough movement casts to 'eventually' get there in a reasonable amount of time.
Secondly, those same Mages (or one of them and the Affliction Warlock) will be keeping DPS on Elegon and only dot up the adds when they spawn because a) We do enough DPS that extra people swapping might actually kill it too fast and b) they don't have enough burst DPS, so it's simply easier to let them DPS the boss. If this is a tactic used by all guilds, then obviously, those classes are doing better on parses. But like I said, we have 2 Mages and a Warlock, what if they all show up? Then there is one Mage that does have to swap for the add, which means his DPS (compared to ALL logs) is going to suck. And we're talking atleast 10% less overall damage done, that would drop him from your example's 70th percentile to under 40th, while he's just as skilled as before and even better geared, but those percentiles don't distinguish that. This becomes an even bigger issue when the simmed top DPS don't ever need to do tactics because you 'need the biggest DPS on the boss' or something like that, skewing your results even more.
And finally: We run most fights with a 2-healer setup, but depending on who's there and what fight it is, they occassionally need some help. That's usually the Elemental Shaman (me) throwing out a Healing Rain or a couple of Healing Surges when it's needed. This is again something not shown in statistical data of average DPS, but if I didn't, my DPS wouldn't matter because people would've died and we'ld wipe.
I use logs to review my performance per boss, not to compare epeen against how other people play my class or even worse, to start a #@$%ing contest between classes/specs. Especially when there is no data presented regarding context of those top or W percentile parses vs the few parses in a specific context/tactic you are comparing it with.
I believe I should make it clear that I am in no way trying to discuss balance between classes. I generally don't care how much better a mage is than a fury warrior or [insert examples here]. My concern lies solely with using quantitative data to use as a helpful tool in determining if your specific raiders are performing to the level that your guild needs to kill enough bosses to be happy. For us, with a goal of all heroic content when relevant in 6 hours a week, you can imagine that standard gets higher than those who, say, have 12 or 16 hours a week.
Also I think maybe my point of "Use percentiles as standards for your DPS" was a bit misunderstood, or rather, poorly explained. Here is the post that we have for our Raiders on our forums to help explain how to become and maintain Raider status (after detailing that we'll be using percentiles in Raidbots):
So, in response to Airowird's example of the mage that would be put into a lower percentile due to his assignment, those things are taken into consideration. We aren't so blind to assume that a hard number applies to any and all raiders. However, we DO believe that:
This is important under the correct context. Strategy variance is taken into account. Specific assignments are taken into account. All other situations though should be pretty much on par or progressively making progress toward achieving that 70th percentile.
An example of assignment or strategy variance would be Zon'ozz. For the heroic part of the encounter, for example, there are 2 prevalent strategies; one with Mages taking a specific DPS hit in order to easily control the boss (the one we used) or with 2 groups managing the ball together. These strategies are both widely used across all public parses (DPS Bot's sample size = all public parses) and therefor skews the numbers a bit for all raiders in some way. At this point we have to dig much deeper into the logs and make some direct comparisons with other players and guild in order to build a close-as-possible apples to apples comparisons.This will be dealt with on an individual basis.
A) A standard of SOME sort is needed if you plan to be fair to all raiders from a leadership standpoint.
B) Said standard needs to be from actual data and not empirical "well it looks like" evidence.
C) Data needs to be looked at in the correct context of the fight.
There are a lot of examples that you could give (much like Airowird's) that could cause you to adjust your view on what numbers are fair. By how much is a really touch-and-go type thing that generally we tend to avoid by just dropping the assessment altogether for that raider unless it's wildly off base. It's OK that the Mage in Airowird's example (who typically is performing at around, let's say 60ish percentile) drops to the 40th percentile. But what if he drops to the 20th? Or 10th? Are those acceptable? That's where it gets tricky.
All-in-all, numbers aren't perfect. I wasn't suggesting that they are. However, due to this fact a lot of people make the assumption that Not Perfect = Useless. I find that as long as you take strategy and assignment variance into account like an intelligent person, using acceptable starting and stagnating percentiles are a fair way to hold your DPS to a performance standard.
I can't imagine using parses as anything other than a personal way for you to improve your own performance OR to determine if you are doing something as an entire group improperly.
I use log parses to determine how close to my ideal rotation I am. I'm usually pretty close...our logs are not public in an attempt to cut down on people whining about getting "top parses" and instead focusing on doing whatever is necessary to kill the boss. Public logs are just epeen measuring unless you're asking for someone to help you analyze them.
I also use World of Logs' expression editor to see how we managed the mechanics of the fight...how many stacks of whatever did people get, did we stand in too much shit, etc. These are the things that logs are useful for. They are NOT useful for saying "well jimmy you weren't in the Nth percentile...we're going to have to bench your ass". If the boss is dead and you did it in a reasonable timeframe...I don't see the issue. If you're not hitting enrage timers (hard or soft) then having jimmy raider going 3k more DPS ain't your problem...
Simulations are even worse. They are dependent on the skill and accuracy of not only the theorycrafting, but also the code involved...they CAN be an adequate rough estimate of what stat might be better or which rotation might work best, but you're still better off trying things out for yourself.
Originally Posted by Gravy