Monday, September 23, 2013

Well written BEHAVIOR Driven Development Scenarios

In my work with coaching teams in doing Behavior Driven Development (BDD), it's been non-trivial (doable, but takes time) to teach what is a GOOD BDD example.  I'm finding that teams that "self start" in BDD are creating UI driven scenarios rather than behavior.  They end up creating scenarios such as the following.

(for an iTunes plugin that culls out invalid song entries in a playlist)
(THESE ARE EXAMPLES OF BAD BDD SO DON'T DO THIS!)
Given iTunes is launched and there exists bad entries in playlist Never Played
When user selects Music and playlist Never Played
Then show dialog listing invalid songs

Given showing user listing invalid songs
When user clicks OK
Then remove invalid playlist entries


There are bad smells in these scenarios:
  1. the end user is mentioned
  2. contains User Interface (UI) language (dialogs, selecting, clicking)
  3. the scenario language contains details that only an experienced user of iTunes can follow it
While the nature of smells isn't that these things are forbidden, but that it's best to marshal your energy in expunging them as much as possible, especially if you're new to BDD.

Although UI scenarios will make most engineers happy because it's clear what the UI is supposed to do and will provide usable system tests, you will create the following problems:
  1. High maintenance--Too much presentation language means the tests need to be updated when the presentation changes even though the behavior hasn't changed. 
  2. Implies the UI design is finished--By virtue of writing the scenario in presentation language (rather than behavior), the presentation design is now fixed in the reader's mind.  The engineer may not wish to challenge this even if they see a better way.
  3. The BDD test will be a UI test--If all the BDD scenarios are written in UI language, then all the testing will be through the UI.  Remember the Test pyramid!  You want as few UI tests as possible because UI tests are inflexible, slow, the most expensive to maintain, and prone to false positives.  In this case, the team is building the plugin, not iTunes so why should they have a Given/When/Then that also describes code delivered by the iTunes team?
  4. The BDD tests will always be slow and slow is less valuable--If all the tests are UI tests, they will run slower than those designed to only test behavior.  A test failure that tells the team that something happened in the last fifteen minutes that caused a regression will be trivial to correct (say less than 30 minutes).  A test falure that tells the team that something happened in the last 24 hours is going to require using a debugger to figure out where the bug is and then figure out which change caused the bug (likely will be an hour or several hours).
  5. Discussions are anchored to UI--Teams that can understand their work at the behavior level will be able to discover problems easier without having their thinking distracted by what's happening in the UI domain and clutter communication with others.
The above scenarios should be rewritten by focusing on the behavior:
(for an iTunes plugin that culls out invalid song entries in a playlist)

Given
there exists bad entries in playlist Never Played
When trying to play this playlist
Then remove invalid playlist entries


This is something the user will pay for!  They don't care about the clicks or if the pointer is used at all!  They want the bad entries in their playlist removed!  If you have POs create such scenarios, they can easily communicate and debate them to product marketing about what behaviors the user wants without needing someone to translate what is the result of UI manipulations.  Give a Scrum team this scenario during sprint planning, they're going to need to discusses possible UIs to allow this behavior to happen.  But they don't need to choose right now anymore than they need to decide how many Classes they need to design and what are the public methods/attributes.  They just need to know if they can do any of the options during the Sprint.  When sprinting, they *will* need to decide on a UI implementation and they will need to implement the test driver which *may* need to drive the UI, or even better, trust that iTunes does it's job to hand off to their plugin and the team only tests that their plugin causes the behavior to happen.  (They likely will need at least *one* end-to-end UI test that ensures iTunes does hand-off to their plugin.)

By the way, for the BDD experienced in the crowd, it would be a great idea to parametrize the scenario to handle many/all playlists when the team is needing to add features for many play lists:

Given there exists bad entries in [playlist]
When trying to play this playlist
Then remove invalid playlist entries

examples:
|playlist|
|Never Played|
|My Top Rated|
|Whole Library|
...

Check out the other BDD resources at this blog such as:

Other Resources:

I highly recommend reading the following which will NOT teach you how to write the code but teaches you how to think behavior and why it's important to do so (available in paper and Kindle):



Here is a great BDD tutorial if you are into music (guitar in particular) http://www.ryangreenhall.com/articles/bdd-by-example.html