The Iowa Caucus Application Malfunction
Bad things Happen To Good People and Companies
If you have participated in a software deployment, then you
know how nail biting the production roll out can be. You have tested for
everything you could think of. You have people on standby to ensure that the
application works as expected to do your final review of the roll out. You
think you have done a great job but then THE USER calls and emails start
flowing in, and you panic. Bad things happen to great people and companies,
sometimes even when we think we have done our best work.
Your first line of defense against the nail biting roll out
is to follow the agile and other project roll out standards. Then do a pre-launch
on a test market that best represents your reality. The problem is in this day
of IoT, Azure development, the promises of hardware and software vendors, you
are expected to do miracles in less time, with smaller budgets, and less
testing.
I hold old world beliefs in this strategy. I like testing,
the more the better, testing that is controlled well defined and then roll out
in production, test again before your high transaction day takes place. Then make
sure your team has created, what if scenarios, for backup strategies for every
process down the chain.
Do we have enough hardware space, CPUs transaction pipelines,
and can we add more if needed in moments? Can we add in more phone lines, (not
only because of extra value traffic but in case people find your numbers and
want to spoof you) and add more customer service/help desk people, and do we
have technicians for each chain to fix anything that cannot failover? Do we
have proper fail over procedures? Do we have backups for everything from
hardware, to code, data, databases, servers, network support, application team
support in case something happens? Do you really want to roll up a patch prior
to a large event? If you have a very public launch all patches should be tested
completely before rolling up. Lately, the security patches themselves have been
causing havoc. Unfortunately, the climate
in the world is aggressive and people want to help defeat you rather than cheer
your success.
It seems from reading about this situation, that there may
not have been enough phone lines and they were not able to add more, that
transactions were blocked and there were not enough staff members, or auditing
tools used to track the blocking, and there were no expectations for any
failure.
In the end this happens to everyone in the IT world and it
usually happens because we left out a step or didn’t have realistic timelines
to test real life production scenarios before the BIG DAY/EVENT.
I hope they will be able resolve the problems and help
create a voting process that the public can trust. I wish them good luck and
all the best in getting this done for their next launch. But don’t point
fingers in glass houses you also live in. Use this experience to enhance your
own development strategies and to remind you how important it is to follow each
step.