Improving Story Point Estimation

I'm keen to use JIRA to provide data and insight to plan and measure improvement activity. My background is Development, over the last 5 years I've been leading on integration development activity which has used the Atlassian tool-set for the last 3 or so years. Now I've taken a Quality Assurance post to see how lessons learnt can be shared and to ask better questions sooner.

My current focus area is estimating and Story Points. I know it's not healthy to estimate tickets in time but I feel we're missing something if there isn't correlation between Story points and elapsed time (at the moment 5 point and 2 point tickets look no different on time-in-status charts). So I wonder how others plan story points - do they reflect value to the customer, complexity or effort or something else?

Have you been able to demonstrate improvement in the predictability of software delivery? 

5 comments

It's difficult to get teams to agree on an estimating method as they feel that they'll be held to strict deadlines and harsh repercussions if deadlines are missed. Ensure all teams that standardizing will assist with estimations and reporting  (ie., better burndowns/burnups, velocity reports etc). You'll find teams are more likely to agree if you measure complexity using story points over time-based estimates. 

Try using the Fibonacci scale for estimating (e.g., 0, 1, 2, 3, 5, 8, 13, 21, 34). For consistency, get managers and leads to agree on a ball park figure of the level of complexity.

Example: 1 is less than a week

               2  is two weeks 

               3 is one month 

 Some will still complain that it's still providing a sort of time estimate - however, explain that agreement is needed for transparency and collaboration across the organization.

Thanks Lekisha. I've used story points using a Fibonacci scale and have tried 'poker cards' to get consensus over complexity. I think you have something when you suggest getting something written down and shared about what a 1,2 or 5 point ticket might look like.

At the moment ~90% of tickets are categorised as having 2 story points.  When I compare time in different statuses, segmenting by Story Points, I see little difference. I wondered whether a good experiment would be to revise estimates at end of Sprint etc (and ask why was a 2 actually an 8). The purpose would be to adjust/improve future estimates (this wouldn't mean going faster but delivering more predictably). Anybody else tried this?   

Hi Ben,

Yes, this is the way we use points vs. time. i.e. Points are for predictive-planning estimates, and time is tracked ONLY for learning purposes. At the sprint retrospective, we quickly discuss some (if you don't have time for all) of the tickets that had similar points but major delta in time of execution. It is a good learning exercise also for the team to re-establish what 1,2,3,5.. point mean. 

Thanks Amir - I will check with one of the teams I support if we can try this experiment in a retro.

Hi Ben,

I think this is a question a load of people struggle with and there is no 'correct answer' for every project. You will need to adapt process/methodology/framework to meet your project goals e.g. scrum works really well for new product development, Kanban works really well for maintenance projects or you could use a combination of both to suit your needs (Scrumban).

Whichever option you decide to go with, the topic of estimation is always a tricky one (especially in new teams).

I work for an Agile minded software development company and we use the scrum framework for all new product development. Like LeKisha, we use Story point estimation (based on the fibonacci sequence) for our projects as we find that it works well for us. However you could use other estimation types e.g. business value or time-based estimates - the important thing is you have a standardized process around estimation i.e. you estimate the same way with the same people each time.

The fibonacci sequence – 0, 1, 2, 3, 5, 8, 13, 21 etc. is used by our Scrum teams as it forces them to provide a relative estimate i.e. 1 is slightly easier than 2, 2 is slightly easier than 3 etc. It does not mean that a story with 3 points is 3 times harder than a 1 point story, rather it means that the 3 point story is relatively harder.

We do not want the team estimating to the 'minute' level and it is important to note that any estimate should meet your Definition of Done i.e. everything required to move the story from 'To Do' to 'Closed'. An important point to note here is that an estimate should be agreed by the team i.e. a developer says 2 story points and a tester says 3 story points - this does not mean that the story will take 5 points. Rather it means that the story will either be 2 or 3 points (depending on what the team agree to).

Furthermore, before any estimation session can occur, you need to ensure that all items (to be estimated) have met the Definition of Ready i.e. ensure the issue has all the information required to estimate and to actually start development work (acceptance criteria, wireframes etc.). You will also want to make sure that the team have read through the issues to be estimated (as they may have questions that the BA needs to research before the estimation session).

From personal experience, it also stops pesky management from asking if something has been done 5 minutes after the initial estimation e.g. Task A is estimated at 1 hour - management comes and taps the developer on the shoulder after 1 hour and asks if it has been done yet. 

To keep things simple, we have defined our point categories as:

Estimating.png

Anything bigger than an 8 means we do not have enough information or the feature needs to be broken down into smaller stories. In fact, as a rule, we try not to estimate above a 5 (instead we try to break down/slice further).

When estimating, we take into account:

  • Risk e.g. if we update this field in the database, what could it break?
  • Complexity e.g. we need to connect to 12 3rd party API's
  • Effort e.g. how long will it take to update 100 text fields?

Planning Poker:

We use Planning Poker to help get team consensus on a story point estimate and to ensure that there is clarity on the acceptance criteria of the feature/story. With planning poker, once the acceptance criteria are understood by the team, each team member (Developers & QC) uses their  fingers (once prompted) to indicate how many story points to apply (using the fibonacci sequence). The team will then compare the estimates and discuss (until consensus is reached).

This has two benefits - it is a quick way to estimate and the team has a fruitful discussion about the story and acceptance criteria.

Obviously, you will be tracking your teams velocity (need at least 3 sprints with the same team members to provide a reasonable picture). Velocity is how many points you can do in a given time (usually 2 weeks for Scrum projects). 

Lets say your velocity is 30 - this means that you can do 30 points in 2 weeks or 3 points per day. Rather than the focus being on a single developer providing the estimate, this would be what the team could achieve everyday of the sprint.

Velocity and capacity go hand-in-hand for sprint planning e.g. if half the team are off on holiday for the next sprint, you would plan 15 points for that sprint. 

Regarding reporting, must-have tools would be a burndown chart and velocity chart.

I highly recommend checking out:

Mike Cohn's website - https://www.mountaingoatsoftware.com/agile 
Atlassian's Agile page - https://www.atlassian.com/agile
What are story points: https://www.mountaingoatsoftware.com/blog/what-are-story-points?utm_source=Iterable&utm_campaign=mgsblog2017Jan17&SNSubscribed=true&utm_medium=email

Best of luck!

Thanks Brett - plenty for me to chew on here! My goal is to find a way to demonstrate improvement in the predictability (as seen on the Control Chart) of delivery. If there's a large variation in time for tickets with the same story point estimate it would seem to undermine the value of Sprint Velocity - it's hard to know how many story points will be secured in a Sprint. Getting consistency in estimating seems key. I imagine another consequence is not knowing how much unplanned work will roll into future Sprints making it increasingly difficult to plan. Am I off track with my thinking here?

Hi Ben - you are most certainly on track. A few comments...

Any estimation (regardless of type) is not an exact science - if things were 100% predictable we would not be estimating! The best way to get an accurate estimate is to actually start doing the work. The reason I like the Agile way of doing things (Scrum/XP etc.) is that you break the project down into small chunks of work (sprints, epics, stories/tasks), you do some estimation upfront, do some work, review what was done and then adjust.

If you spend too much time on trying to improve the accuracy of your estimation, you will not get anything done (or at the very least you will have a very poor velocity)! The graph below highlights this point - remember, estimates cost time/money too.

 Cost of Estimation JPG.jpg

As mentioned in my previous post, one way of improving estimation (and thus predictability) is to focus less on the individual and more on the team i.e. how many points the team can do in a given sprint versus how many points each developer can do. The best way of seeing progress is to show stakeholders what the team did in a Sprint (Sprint Review) - you can gauge how much progress is being made by the reactions of the stakeholders! Also, it is important to note that the accuracy of team velocity improves over time e.g. you could expect to see something like S1 = 10 points, S2 = 50 point, S3 = 30 points, S4 = 29 points, S5 = 31 points (if the team is the same, it estimates the same way and all other things are equal).

Regarding metrics, the first step when measuring is to make sure that you have an agreed definition of what it is you are trying to track (and why you are doing so). I like the SMART method:

https://en.wikipedia.org/wiki/SMART_criteria

You want to ensure that the tracked metric/KPI (Key Performance Indicator) provides some benefit before putting too much time into it - I have worked in companies where management want loads of KPI's just for the sake of having loads of KPI's (the usefulness of which is debatable). Remember, like estimation, there is a cost associated with producing KPI's (it often outweighs the benefit!).

Once you have the exact metric you are looking to track, you then need to ensure that the tracking conditions do not change (as much as possible).

An example is changing team members frequently. If the members in team A do not change and the members in team B do, the variance of estimation accuracy of team B will be higher than team A (and planning predicability will thus be lower). This means that when Sprint planning, Team A will generally be more accurate with the amount of work they can complete in a given period of time than Team B. It does not mean that Team A will do more work than Team B however.

Regarding managing unplanned work, the first step is ensuring that ALL known work is in Jira. This means ensuring all technical tasks (infrastructure, code refactoring etc.) and business requirements (stories or change requests with clear DoR) are in Jira. If anyone in any of my teams works on something that is not in Jira, I get pissed off very quickly (as it means that we cannot plan around it). 

Furthermore, your velocity will include all meetings, admin work and other types of overhead (tasks that do not need to go in Jira). If you find that your team has a Live (must do end of the world issue) every sprint it will again be shown as part of your velocity. If these issues happen from time-to-time, you should not worry too much about it (as you would treat it like any other task i.e. estimate the issue, move it into the current sprint and then take existing stories out of the Sprint to the same value). You would also highlight any unplanned work in the retrospective - usually the work could actually have been planned in (if we followed project process).

Capacity, Velocity and estimation all go hand-in-hand with planning. If you do not have these, you will not be able to provide stakeholders with how long something will take.

I would suggest agreeing on a project management methodology/framework as your very first step e.g. Scrum, Kanban, XP. Use the 'out the box' processes, tools & reporting and only when you feel comfortable (and are getting good feedback from the team), adjust the processes to meet your exact needs. 

I do everything I can to not relate time to story points. Its often the hardest part and often has to be revisited.

One of the methods I use:  

Imagine your task was manual data entry. It takes a long time to sit and key in data over and over all day if that was your only method to complete that task.  Its not hard, its just time consuming and boring as hell. For me this is a 2,3 because again its easy work, It just takes time.  Now if we create a story to automate the data entry process. That story could possibly be an 8 or 13 even though the tasks associated with it may take less time to accomplish than a round of data entry in a given sprint.

With developers, I equate story points similarly to the chart Brett shared but expand it to the 20 in the sequence.  i.e. 0,1,2 are small, some thing like a table change. You probably have all the info you need and can move on quickly. Zero being you know exactly where its at, what to change and how to test. 2 being you think you know what table its in but may need to explore the code to be sure. 3,5,8 are your every day middle of the road problems. You think you've got most of the information to move forward (closer to a 3) but may have a few questions as you get into it further (closer to an 8). 13's are major questions about the story, potentially exploring/discovery and then reassess the attempt to complete the ask. 20 (or 21 if you must) are almost small epics for me as it is approaching to much work for one sprint or more detail must be obtained to accurately break this work up. A 20 is complex, with a need to explore/discover.

Thanks,

Thanks Michael, this will help me qualify what  1,2,3,5.. points mean. I envision having a number of example stories that specify a Story Point with justification. I think we can also collect watch-outs for past underestimated tickets - is there a trend in a particular aspect/feature not being considered/ undersized.

we cover the "work was harder than anticipated" aspect in our retrospective. Its usually around the time I bring up why didnt we complete all the stories we committed to.  ;-)

We adapted the Fibonacci scale in our agile teams to just 0 - 0.5 - 1 - 1.5 - 2.5 - 4. Like @Brett Willson stated, we assume that anything bigger than the Fibonacci 8 point mark is something that needs further elaboration and needs to be broken down into smaller stories.

Reporting shows that the bigger stories are (estimated), the fuzzier the relation between complexity and time to completion becomes. One might at first get the impression that a 4 point story is 4 times more complex than a 1 point story. Development times of those same stories varies on average between 3 and sometimes even close to 10 times as much. We see that as an indicator that stories of too much complexity become more and more unpredictable.

Thanks Walter. I like the idea of capping how complex a ticket can be before being accepted into Sprint ;defer and elaborate. I'm aware of one scrum team in my organisation that has stopped story point estimating altogether - if there's a very fuzzy relationship between complexity & time, why estimate (as estimating has a cost too)? I'm sure I've only heard half the story and I think there needs to be something in place instead..

I think we still have work to do to be able to break larger tasks into a set of smaller more predictable ones - I'd be intrigued to know if anybody's found a good way to do this? I can only discover the right elaborating questions by looking backwards to understand why some tickets were bigger than expected.

Nice article, Brett. Had experience with the "Spike" on other projects.

Spikes are very useful if used correctly - one thing to watch out for is spending too much time 'investigating' and not enough time doing... This could mean that the requirements are unclear (or do not provide enough initial information)...

SAFE has some good views on this: https://www.scaledagileframework.com/spikes/

Comment

Log in or Sign up to comment
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Published Jun 14, 2018 in Jira Service Desk

How the Telegram Integration for Jira helps Sergey's team take their support efficiency to the bank

...+ reading Fantasy). The same is true for him at the bank he works for: Efficiency is key when time literally equals money. Read on to learn how Sergey makes most of the time he has by...

430 views 2 4
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you