Monday, September 23, 2013

Well written BEHAVIOR Driven Development Scenarios

In my work with coaching teams in doing Behavior Driven Development (BDD), it's been non-trivial (doable, but takes time) to teach what is a GOOD BDD example.  I'm finding that teams that "self start" in BDD are creating UI driven scenarios rather than behavior.  They end up creating scenarios such as the following.

(for an iTunes plugin that culls out invalid song entries in a playlist)
Given iTunes is launched and there exists bad entries in playlist Never Played
When user selects Music and playlist Never Played
Then show dialog listing invalid songs

Given showing user listing invalid songs
When user clicks OK
Then remove invalid playlist entries

There are bad smells in these scenarios:
  1. the end user is mentioned
  2. contains User Interface (UI) language (dialogs, selecting, clicking)
  3. the scenario language contains details that only an experienced user of iTunes can follow it
While the nature of smells isn't that these things are forbidden, but that it's best to marshal your energy in expunging them as much as possible, especially if you're new to BDD.

Although UI scenarios will make most engineers happy because it's clear what the UI is supposed to do and will provide usable system tests, you will create the following problems:
  1. High maintenance--Too much presentation language means the tests need to be updated when the presentation changes even though the behavior hasn't changed. 
  2. Implies the UI design is finished--By virtue of writing the scenario in presentation language (rather than behavior), the presentation design is now fixed in the reader's mind.  The engineer may not wish to challenge this even if they see a better way.
  3. The BDD test will be a UI test--If all the BDD scenarios are written in UI language, then all the testing will be through the UI.  Remember the Test pyramid!  You want as few UI tests as possible because UI tests are inflexible, slow, the most expensive to maintain, and prone to false positives.  In this case, the team is building the plugin, not iTunes so why should they have a Given/When/Then that also describes code delivered by the iTunes team?
  4. The BDD tests will always be slow and slow is less valuable--If all the tests are UI tests, they will run slower than those designed to only test behavior.  A test failure that tells the team that something happened in the last fifteen minutes that caused a regression will be trivial to correct (say less than 30 minutes).  A test falure that tells the team that something happened in the last 24 hours is going to require using a debugger to figure out where the bug is and then figure out which change caused the bug (likely will be an hour or several hours).
  5. Discussions are anchored to UI--Teams that can understand their work at the behavior level will be able to discover problems easier without having their thinking distracted by what's happening in the UI domain and clutter communication with others.
The above scenarios should be rewritten by focusing on the behavior:
(for an iTunes plugin that culls out invalid song entries in a playlist)

there exists bad entries in playlist Never Played
When trying to play this playlist
Then remove invalid playlist entries

This is something the user will pay for!  They don't care about the clicks or if the pointer is used at all!  They want the bad entries in their playlist removed!  If you have POs create such scenarios, they can easily communicate and debate them to product marketing about what behaviors the user wants without needing someone to translate what is the result of UI manipulations.  Give a Scrum team this scenario during sprint planning, they're going to need to discusses possible UIs to allow this behavior to happen.  But they don't need to choose right now anymore than they need to decide how many Classes they need to design and what are the public methods/attributes.  They just need to know if they can do any of the options during the Sprint.  When sprinting, they *will* need to decide on a UI implementation and they will need to implement the test driver which *may* need to drive the UI, or even better, trust that iTunes does it's job to hand off to their plugin and the team only tests that their plugin causes the behavior to happen.  (They likely will need at least *one* end-to-end UI test that ensures iTunes does hand-off to their plugin.)

By the way, for the BDD experienced in the crowd, it would be a great idea to parametrize the scenario to handle many/all playlists when the team is needing to add features for many play lists:

Given there exists bad entries in [playlist]
When trying to play this playlist
Then remove invalid playlist entries

|Never Played|
|My Top Rated|
|Whole Library|

Check out the other BDD resources at this blog such as:

Other Resources:

I highly recommend reading the following which will NOT teach you how to write the code but teaches you how to think behavior and why it's important to do so (available in paper and Kindle):

Here is a great BDD tutorial if you are into music (guitar in particular)

Saturday, August 17, 2013

How NOT to Motivate your Organization to Perform

The TED Talk video by Dan Pink (at the bottom of this post) reinforces a few things that I've witnessed in doing Agile consulting and it validates many of the Agile values around roles and responsibilities.

I did a technical training in Test Driven Development (TDD) and Refactoring to a group of highly skilled software engineers at a client in Shanghai. The engineers were grouped into pairs and were each implementing a set of features using TDD. Things were moving at a steady pace. With 2 hours left in the day, the attendee's energy was dropping so I offered a 200RMB (abt. $30 but really $200 in spending power) reward to the pair that could get a the functionality finished AND could pass my 3 bug injection tests (to see if they had good unit test coverage--did good TDD). The room was electrified at the announcement and the teams of two become very focused. At the end, I didn't have to pay out although one pair was pretty damn close.

What was reported at the end as interesting: the carefully refactored and well designed code became hacked and sloppy in effort to quickly finish the job. After watching the Dan Pink talk about the Candle Game, I wonder if some would have finished if I hadn't offer the incentive.

Dan's talk hits themes of how traditional management of carefully tracking workers and stepping in isn't as productive as slackening the reins and creating an environment of self direction. After all, software engineers are highly educated people. It's sad to see how management sometimes steps in their way and makes technical decisions for them.

A colleague and I argue about whether Scrum's Chicken and Pigs roles ( do more damage than help. I say it depends. If management has a habit of doing micro management about how code is developed, then there will be waste made in bad decisions or time lost in Scrum meetings being taken over by well meaning but over-reaching management. It's like the Feds overstepping their boundaries into State and county areas. The Chicken and Pigs model stops this, sometimes creating a different kind of damage but I feel it's necessary to swing the pendulum the other way for a year or so until management and development (in the Agile world, this includes test) better understand the boundaries of what decisions are technical and what decisions are business. Mixing these together creates as much waste as a baseball game where the pitcher is also playing third base and giving orders (as opposed to advice) to the outfield.

Monday, July 1, 2013

BDD Practices that Maximize Team Collaboration and Reduce Risk

Maximizing team collaboration and reducing risk

Guiding Principles: KISS – Keep it simple silly & Incremental delivery

I’ll think about the future but only implement what I need this sprint, this day, this hour. I do that so I can start my implementation at the simplest level (but not too simple) and grow the feature minute by minute, hour by hour.
Because I work this way, I can check-in often. This enables other team members to have visibility on what I’m doing. This allows for cheaper integration (merges) for the team and myself. It allows me to get the latest changes from the source tree so I can receive new work from the team (and get visibility into what they’re doing). Checking in often reduces risk and increases code reuse. I can’t reuse my team member’s latest new shared libraries if it isn’t checked in. If I get pulled away from the work (personal or fire drill), another team member (who has some insight into my work because they’ve been seeing my many changes coming into their development environment, who has heard about what I’m doing at a high level during standup, who is collocated and has heard me working with my pair, who maybe has paired with me) can finish the task or story so the team is successful.
Because the team is checking in often, we are releasing our best design efforts and reducing latency for others to review and use our work.

Test Strategies

Guiding Principles: Tests run independent of each other, give timely feedback, are maintainable, deterministic, and test a feature once.

It’s best that EACH test run in a clean environment: clean data, clean server state, clean client state. This may not be practical if the tests take too long to give timely feedback.
Common strategies are any combination of:
  • Test login only once, the other tests skip login: share selenium driver across tests, or have a way to generate a session token so your tests don’t need to login.
  • Don’t commit data destructive tests to the DB or find a way to get the data "reset."
  • If you don’t mind your tests running slow in different contexts, then make the above "workarounds" configurable so you get fast feedback in dev branch, but the "slow tests" that login and clean all the state/data in int branch.
Managing Data:
  • Maintain a data dictionary—This is a single point where we declare what data in the DB we are dependent on. Three columns: short hand description that’s used in the code, database id, notes. The data dictionary reserves data and is a signal to the viewer that they shouldn’t write to the data or change it as there are tests relying in the data.

BDD in your Sprint

Behavior Driven Development implies that something is driving development. This something is failing or pending BDD tests. Immediately after Sprint planning, get those failing tests checked in and executing so that failures/pending are visible on your continuous build monitor. Why? So everyone can casually see the status of your project, just as you do when your commute takes you by a construction project.

Three levels of being driven:

1. Immediately get the entire sprint backlog features checked in as pending tests. Then before each feature is finished, implement the Steps to prove the feature works.
Risk: Too much automation work is left for the end of the sprint and we deliver features without automation. We don't discover requirements gaps early in the sprint because automation is postponed.
Value: Make visible what work is accomplished versus pending. Doing automation before implementation tests your understanding before you do it all wrong. :-)

2. Immediately get the entire sprint backlog features checked in as pending tests. Next, implement the Steps to prove each isn’t working (imagining the UI elements are built, coding to IDs or button names that you make up in your head). As each feature is implemented, tests should pass.
Value: if you can automate the tests, you've proven a deep understanding of the requirements or have revealed problems early in the sprint when there is time to fix that problem through collaboration.
Also, you've made visible on your build monitor a lot of status: Pending automation versus automation completed versus features completed.

3. At project start, enter all the pending features in the product backlog, and then sprint by sprint stakeholders see progress toward features as the teams use the above level 2, and as time passes, those features are split or remove or new ones added.
Value: Makes visible how the project is doing at the release level.

What happens if I get stuck automating?

  1. Get help from a team member because pair working results in 80% effort toward correctness and we come up with more ideas to try and solve the problem.
  2. Talk to the whole team about the issue as no single individual owns quality.
  3. Escalate issue to organization
  4. By Sprint end, maybe the team decides it’s not automatable and that they’ll test it manually.

  5. Given When Then Design

    These are BAD smells:
    • Need an engineer or power user to understand the GWT. (An end user or end user’s boss whose never used the product can understand the GWT.)
    • UI language mentioned in the GWT
    • feature file is many pages long
    • GWT language is not reusable. Lack of consistency of language.
    What if my "feature" has no behavior:
    • Go find it!!!
      • Talk to your PO and find out "why" the customer wants this.
    • If you’re delivering a component that is *part* of a feature, then
      • Go look at the feature’s GWT and understand how your component supports that feature.
      • Implement Steps for that feature to test that your component delivers that part of the feature.

    Test Steps

    These are NICE smells:
    • Step methods are about three lines long.
    • No @Alias(es).
    • Steps call into a shared library (rather than call interfaces on other Step classes).

    Use Continuous Integration

    It's hard to get started using CI if you aren't already. Here's how to get started.

    Guiding Principles: you can’t automated what you can’t do manually and you'll always find a better way to do it next week (but don't wait).

    If you tackle all your technical difficulties from head on, it's hard to make progress. So deliver your way to a CI by using evolving coding and learning, and incremental delivery. Because each CI server out there can work with any possible script, script it using methodologies that get the job done (for you). Start getting the value from your integration efforts as soon as possible.
    1. Manual integration: integrate on a separate machine (using scripts that later will be executed continuously.)Scripts should be able to find new tests at run time rather than rely on static lists. A script should be able to build, install/deploy. If you can't yet create scripts, do it manually until you learn how.
    2. Continuous Integration: Select CI software, install it on a server, and then automate the execution of scripts ina CI.

    If you have feedback, agree, disagree, feel free to comment as you'll be adding value!