Affliction: Too Strong? How Can We Even Tell?

Is Affliction too strong?

My answer: Maybe.

If you ask our Mage friends, or our Shadow Priest friends, the answer is a resounding “Yes!” Even from unbiased sources though, Affliction damage might be overall too high:

Pretty happy with PvE overall. Arcane will be fine without Scorch spam. Aff may be too high. UH, Sub, Ele, Arms may be low.

– @GhostCrawler on Twitter

(However, it might very well be that the running joke about Ghostcrawler playing a Mage is true..)

I want to note though, that there’s a little more information gleaned from that tweet than at first glance. A closer reading into which specs Ghostcrawler thinks there’s an issue reveals a lot about what Blizzard’s design philosophy might be.

How Can We Tell? Or Raidbots – Understanding Real World Results

Whenever discussions like these come up, there are usually 2 metrics brought under scrutiny: Simulation results from SimulationCraft, the latest of which looks like this:

SimCraft T14H Raid Results from 12/30/2012, generated with 510-8.

and RaidBots’ DPSBot, which is generated by running common statistical measures over all the public parses submitted to World of Logs. The default metric used on RaidBots is ‘Spec Score’ which performs a little black magic to reduce the effects of so-called ‘gimmick’ fights in order to correctly assess real world strength:

RaidBots DPSBot 25H Spec Score - 12/30/2012

RaidBots DPSBot 25H Spec Score – 12/30/2012

Both these measures have their strengths and weaknesses – a simulation is just a simulation, it doesn’t hold a candle to how each spec performs in the real world. DPSBot itself measures the real world, but we don’t know what goes into the tweaking for Spec Score. As we’ll see though, while we might be inclined to check out every individual fight on DPSBot and consider that gospel for spec strength, there are a myriad of other issues. Instead of walking through RaidBots numbers than anyone can look at just by going to the site, I’ll walk through some of the issues and caveats you should have while using RaidBots.

The Issues with RaidBots

While Spec Score might seem like a black box, Seriallos, the creator, actually has a breakdown of how it works here. Even so, there are obvious problems in using it to determine spec strength. The first such issue is that of role: On Will of the Emperor, melee have very little uptime on adds (other than Courages), so the fight for them is a single target fight. Ranged, however, get the benefit of multi-dotting and AoE-ing Rages. On the other hand, melee get to do the Opportunistic Strike dance, which gives them a huge DPS boost. But not only is there a melee/ranged disparity on the fight, one of the common tactics for most guilds in 25H is to have Mages ring of frost and CC the Rages into groups, to have Hunters soak Titan Sparks or to have Death Knights tank Strengths. So compared to a Rogue or to an Elemental Shaman, these classes have completely different roles, but yet all these numbers are rolled into Spec Score.

The second issue is that of weighting. Here’s a quick example: Fire is Spec Score 100 on Amber-Shaper Un’sok, while Shadow is Spec Score 60. Even if Shadow is 5% better on 8 other fights, the specs will seem to be equal – but would we actually consider them equal in that case?

On the side, we can also select ‘Overall DPS’, which seems to be an average over all the fights without any sort of weighting. With this metric, Affliction is only 10% ahead of Beast Mastery. While using this seems to be more within the realm of credibility, the fact is:

We should use neither.

The issues with Overall DPS are even more clear: because it does not normalize on a per fight basis, fights like Wind Lord Mel’jarak 25H:

484k DPS into an average...might skew it a bit.

Sticking 489k DPS into an average…might skew it a bit.

completely skew the results.

Finally, because of an issue with World of Logs, Heroic parses for Terrace of the Endless Spring do not show up.

Fight-by-Fight – What do we have to look out for?

So if we don’t look at the fights aggregately, then the alternative is to look at each fight on its own, and try to draw our conclusions from that. Maybe Affliction is top for every fight, or near the top. In that case, we might argue that is overpowered. Even in looking at each individual fight though, we have to keep certain things in mind.

Sample Size

The first is that of sample size. This is an argument for comparing only the dominant spec for each fight. There has been a persistent complaint ever since Patch 5.1, when a change to Kil’jaeden’s Cunning made it so that it was no longer a DPS loss to move, but simply a survivability loss (from the snare). Affliction’s attackers would say ‘look, Demonology is supposed to be the movement spec, yet it still lags behind Affliction on movement fights like Vizier and Blade Lord!’



Here’s the factor that everyone else is ignoring:

There's a significant player skill difference there.

There’s a significant player skill difference there.

There’s a player skill difference represented there – almost all Warlocks are playing Affliction, and min-maxers as raiders tend to be, a disproportionate number of the most skilled raiders are playing Affliction as well. In addition, using DPSBot without looking at the sample size actually leads into making ridiculous comparisons. When I look at the 100th ranked Demo lock, he is actually around 55th percentile for that spec. When I look at the 100th ranked Affliction lock, he is 95th percentile.

Any comparison that has such a disparate sample size should be thrown out – you might look at the top ranked parse from each spec, but even then it’s impossible to say whether someone that’s the #1 Destruction parse would be the #1 Affliction parse or the #100 Affliction parse.


The second is that of role. The Will of the Emperor example covered a bit of it in detail, but there are a good number of fights where roles are simply so different that comparing Melee to Ranged or DKs to Rogues is almost meaningless. Because of a lack of in-fight utility, Warlocks in general have the least challenging responsibilities. Warlocks might be asked to CC using a spear on Wind Lord Mel’jarak, but not before a Mage is already on Polymorph duty. Likewise, Dark Bargain is an excellent soaking tool on Heroic Elegon, but it’s not as strong as a Mage’s Greater Invisibility. Our hard CCs are mostly Fear-based, so not ideal for keeping targets immobile. Because of this lack of active utility, we might expect Warlocks to outperform slightly in real world situations.

How do we account for this? There are fights where utility doesn’t factor in. Feng the Accursed is a fairly straightforward fight unless you’re called upon to soak Lightning Fists. On Garalon Heroic, pretty much every ranged DPS is needed for Pheromones. Both Vizier and Blade Lord are gimmick-less fights.

Mostly though, except for the most egregious cases, we don’t account for this. We can make the assumption that the best parsing members of each spec are those that were allowed to scumbag DPS their way to the top of the meters, no responsibility holding them back. It is something to keep in mind though, if someone ever says ‘look how bad Frost DKs are on Will of the Emperor Heroic!’

Hybrid Tax

The third thing to look out for is the hybrid tax – the idea that hybrids sacrifice throughput for the ability to offer up utility in a variety of aspects. Officially, it may or may not be retired. In reality, it lives on – although it might be exacerbated by the fact that PvP balance for hybrids is in shambles at the moment.

It looks something like this:

Generated using 'All Parses', filtering out all non-dominant specs.

Generated using ‘All Parses’, filtering out all non-dominant specs.

At the bottom, you see all the caster/heal off specs (Balance, Elemental, Shadow) languishing. The worst ‘dominant’ pure spec is Assassination, at 127.8k, while Affliction and Arcane sit a cut above. There are a few ways to interpret this data:

  1. Arcane and Affliction are too strong! 14% stronger than Shadow and 19% stronger than Elemental!
  2. Shadow, Balance and Elemental are too weak. There’s no reason to bring these specs!
  3. Arcane is less than 8% stronger than any other dominant pure DPS spec – that’s pretty good balance!

I’ve read comments arguing both 1 and 2 on Warlock forums, often both at the same time. “Nerf Affliction, at 20% worse than Affliction, there’s no reason to bring Shadow!”

Is that so?

Dat Class Stacking

Dat Class Stacking

The fact of the matter is, there are plenty of reasons to bring a Shadow Priest even despite the gap in throughput. Every class brings utility, but hybrid utility stacks (the more Hymns, the merrier) while pure utility does not (we don’t have 6 Healthstone charges). Many, many guilds this tier have found reason to sacrifice throughput for this utility – perhaps the most prominent examples can be found in both our World First guilds – Paragon brought 2 Balance Druids and 1 Warlock, while Method brought 3 Balance Druids and 2 Warlocks to their Heroic Sha of Fear kills.

Is the current state of hybrid vs. pure DPS ideal? There are arguments to be made for both sides. But whether or not the intent behind the hybrid tax exists, the fact is that it appears to, and as such, you cannot compare hybrid heal/caster specs to pure DPS specs.


Finally, fights that are classified as gimmicks. We can just run down the list here: Stone Guard is a gimmick fight, Garajal is a gimmick fight, as is Elegon. Will of the Emperor is a fight with widely disparate roles. Garalon is a fight with melee gimmicks, Wind Lord is a fight that puts up cartoonish numbers, Amber-Shaper is a gimmick fight. Protectors of the Endless has a lot of potential for padding. Sha of Fear is all over the place, depending on strat.

This isn’t to say that we should ignore all these fights. Affliction certainly is the best Elegon spec. It’s still important to apply critical thinking to see if these inflated gimmick numbers would translate to an actual problem: class stacking.

With all these issues in mind, anyone can look through and see how the evidence stacks up in favor or against Affliction being too strong. Keep in mind that being ‘first’ on meters doesn’t mean anything by itself – it’s being first by a lot, or like I mentioned before, being so far ahead that it leads to class stacking, that’s when we’re in trouble.

So is there a class-stacking issue?

A little bit. The results from Feng are for a beginning level Heroic. The two most difficult fights that we have data from RaidBots for are Vizier 25H and Grand Empress 25H. Both these fights show a bit of spec stacking, where the top two specs get more representation than in other fights:

The best specs with a bit more in numbers

The best specs with a bit more in numbers – Imperial Vizier 25H

The best specs with a bit more in numbers - Grand Empress 25H

The best specs with a bit more in numbers – Grand Empress 25H

For the sake of balance, Affliction probably deserves a tiny nerf (beyond the 5.2 Glyph of Sacrifice nerf). But Kil’jaeden’s Cunning shouldn’t be where it comes from, because while I left it off my first chart, look at where Demonology is:

When you account for samples, Demonology is likely as strong as Affliction is now, if not stronger.

When you account for samples, Demonology is likely as strong as Affliction is now, if not stronger.

That’s it for this year! I saw the complaint that there’s not enough theorycrafting for Warlocks today, so going forward in the New Year I’m going to start posting on some really mathy topics in order to offer the community something different. Happy New Years, everyone.


One thought on “Affliction: Too Strong? How Can We Even Tell?

  1. Two interesting things in the Feng chart.

    1) I wonder if part of the issue with affliction is that it scales too well. If you look at the numbers from two months ago, affliction is solidly middle of the pack. Slap two months of gear onto everyone and affliction shoots to the top. Would be interesting to see if this trend holds across many encounters.

    Though anecdotal, I read concerns from people who took advantage of the hunter pet bug providing an extra 10% raid haste (including reforging around it) that scaling is going to be a huge issue. As haste scaled up, haunt uptime scaled up and dps really took off.

    2) This is the most clear-cut example of the sample size/player skill issue. How do you buff arcane? You nerf fire. All the skilled players flocked from fire to arcane and arcane shot to the top of the chart.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s