Andy Hedges' Blog

4 Symptoms of Dysfunction in Software Teams

Once you’ve been in the industry for a decade or so you start to get a sixth sense of when things aren’t right. However even when your sixth sense isn’t working here are some signals that should raise alarm bells.

1. It Dependencies

Every time you ask someone how something works or where some data is the explanation starts with ‘it depends’.

For example say I enquire how a user’s password is verified, I’d get an answer something like this:

“It depends, if the user registered before 2001 then you need to call the genesis logon system, after 2001 however we switched to active directory, unless the user was a customer of that company we bought in 2005, in which case you’ll need to call the ‘migration’ LDAP they put in place at the time which uses Netscape LDAP so you’ll need JNDI library we patched to work around some bugs. Also if the user is an administrator we keep the credentials in the staff database so you’ll need to do an ODBC look up for that.”

You get the idea, technical debt has built up, poor decisions haven’t been rectified and now there is there a laundry list of ‘it depends’ for every question.

2. Jane Doe

Here’s where we have a systems or component that only one person understands, the cynical might refer to this a mortgage driven development however it often happens because of apathy too. Unfortunately, the system might not be considered sexy to work on and therefore hasn’t been touched other than to keep it ticking over. The trouble is that every time there is a problem only Jane can fix it and she’s not much into do that work either. She’ll do the minimum to keep it ticking over.

The trouble with this is that the system is providing value to the organisation otherwise it could be turned off. It’s very likely if it’s providing value that at some point in the future it will need enhancing and nobody is going to be able to do and that — and then Jane quits.

3. Town Hero

This is the support equivalent of ‘Ask Doe’, it isn’t a case of if this system will fail but when and how dramatically… and it’s always dramatic. When it does go wrong there is only one plucky guy in support who can fix it: our local hero. Undoubtedly he’s not around when things go wrong but everyone knows he’s the only guy that can fix it.

When the inevitable catastrophic failure occurs calls are made to his landline, his mobile and DMs to his twitter account. He’s unreachable, what should we do? We’re doomed. When all seems lost a call is received. He’s been located, but he’s on a beach in Rio. There’s no way for him to connect to take a look. However he’s had an idea he’s VPNing into a server in Hamburg to create a SSH tunnel through the San Francisco DC and from there to the corporate firewall… he’s in! A few minutes later he’s diagnosed the potential problem. People are called to mission control, orders are given, procedures are ignored, changes are made. The hero calls into the bridge and explains the problem, it’s going to be tough and take some time but he can do it — nobody else has a clue what he’s talking about. In the small hours the system is restored, everyone pats each other on the back for pulling together and working as a team — also, more importantly, thank goodness for the hero, what would we do without him?

All is well in the world thanks to that guy, our hero — until next month, when the exact same thing happens again.

The trouble is these systems never get fixed, either because the team that that supports them aren’t qualified to do so, aren’t allowed to do so or because they are too busy, maybe too busy being heroes.

4. Carcinogenic Prototype

This system starts with a great idea, an idea that has a lot of potential. In order to qualify that potential a demo/pilot/prototype is created and as it turns out the potential is realised. The business become very keen to get this prototype into full production — they might just make those targets they thought they were going to miss if this works out. The engineers who built the prototype are pleased because their work is demonstrating value but reticent because it was just a prototype.

The engineers mention this to the group. The system wasn’t designed to go live, it doesn’t scale easily, it doesn’t have good exception handling, it doesn’t do any logging, it can’t be monitored, the list goes on. The engineers are overruled but promised to be given time to fix it up later.

Later never comes because once the system is live feature enhancements come pouring in, the system grows rapidly and sometimes starts to affect other systems too. Unfortunately due to the quality of the system it’s very hard to add features in a clean way and so the technical debt grows, it compounds.

Solution

As with most things in software engineering the technical problems are symptoms of the organisational causes. For each of the symptoms listed above there are numerous organisational reasons for them occurring, I’m going to start to blog each of these organisational issues over time. However at the heart of the problem is a lack of desire to change the status quo, to enforce a level of quality in the engineering and take ownership. Take ownership of your minimum viable quality and stick to it.

Andy Hedges
[comment]

SOA vs Microservices

The microservice term has been around for a while but for as long as it has been around there has been much arm waving around what it was. It continued as one of those nebulous concepts that meant anything and everything to everyone and no one. In this way it caused a little confusion but served as a token to reference something — but what? The closest you could get was was that the services were smaller than monolithic applications, by some measure that was vague. Of course, as is often the case some people were tacking fairly arbitrary stuff onto the term too. That’s where we were, microservices were smaller than just about the biggest thing anyone can ever create in software: monolithic applications. Monolithic applications are like the Graham Numbers of software architecture.

“Graham has recently established … a bound so vast that it holds the record for the largest number ever used in a serious mathematical proof.” — Martin Gardner (c.1977)

This wasn’t particularly useful but pretty much par for the course in the IT industry, buzzwords come and go and often don’t have concrete meaning attached to them, or have multiple conflicting meanings attached to them. However this is where things took an interesting turn, Martin Fowler, a well known name in the industry made an attempt to document what the terms could mean. Fowler takes a very thorough approach to documenting things and has a rather large following of people, so when he does document something the term:

  • tends to stick
  • have concrete meaning

The trouble was that a number of people to a great or lesser extent, including myself, believed the definition of microservices that Fowler came up with is remarkably similar to SOA as known and documented, with some other practises thrown in. Those other practises were debatably a little orthogonal to the topic at hand, that is, they were good practise but beside the point.

It’s well documented what SOA is and now it is well documented what microservices are. Compare them side by side and most reasonable people, I think, will come to roughly the same conclusion. So how did this happen?

Marketing and communication

SOA is unfortunately a hard concept to grasp initially, it takes a little time, then most people, if they persever have an epiphany and wonder what it was that they didn’t understand in the first place.

Trouble is I think that only those who really went to town on SOA the first time around had that epiphany, but why did some go to town on it and others not.

What happened with SOA is that the waters were muddied with a large amount of esoteric and/or complex language, take a look at the SOA Reference Model for example, it has some very good information in there but it is an intimidating read and I can imagine that not many have read it. I would absolutely forgive anyone for not reading it, it took me about 5 quite determined attempts…

If you contrast this document to Fowler’s blog post then you can see which is more easily digestable. Fowler is (literally) an expert in the written communication of technical concepts, perhaps the best .

Vendor spin

As anyone who’s been in the IT industry for any amount of time will know for any given buzzword there are roughly a billion vendors trying to capitalise on it. This may be as simple as saying it is SOA enabled or microservice compatible but it can take more subtle forms too.

One of these approaches is to create a related standard or pattern. It wasn’t long after SOA started getting discussed as a concept that WS-* appeared on the scene with a seamingly endless parade of complex specifications and reference implementations of said specification. Each specification was dutifully implemented by the big vendors so that they could become ‘service-enabled’.

Unfortunately these specifications were awful, some absurd, moreover these vendor implementations didn’t work very well with each other, indeed some of them needed standards to defined the standards. That is a web-service exposed by one vendor couldn’t be reliable called by a client from another vendor. This left a bad taste for the right thinking engineer and there was violent push to the opposite end of the interface spectrum: ReST, JSON and so forth but that’s another story and one we all learned from.

As I eluded to early it wasn’t just specifications that clouded the SOA picture. In order to sell software licenses it was key to have something big, expensive and most importantly critical to the client once implemented. There is no greater example of this than the ESB. However I’ve written about why you don’t need an ESB before, needless to say this also left a bad taste.

For the record, to my mind, neither ESB or WS-* are desireable in a modern SOA. Unfortunately, sometimes WS-* is the only way to communicate with legacy applications. The industry did learn from all this, I hope.

Time fades

Finally, time fades, engineers like new things, I know I do. I like to play with new technology, think about and discuss new ideas. I’m also forgetful, I forget some of the cool things I once knew. As the expression goes, everything old is new.

Summary

For the most part there isn’t really anything new in microservices over SOA. SOA has some baggage of ESB and WS-*, that is some people confuse that with SOA, unfortunately vendors did that. For the record some of us gave them a hard time for that at the time.

Do we need a new term? I don’t think so, I think it would be more useful to tie down what we mean by SOA, clearly remove the bad things and highlight new thinking. However by renaming I think we revise history, we lose the discussion, the ‘changelog’.

My humble suggestion would be that it is more useful to highlight the difference, what’s new and give those things names, think of these changes as microbuzzwords.

Andy Hedges
[comment]

Service Decomposition, Cohesion & Coupling

Service Oriented Architecture is about making IT look like your organisation — your business. In many companies IT systems are broken down in to lumps that aren’t the same lumps that the business understands. You may have systems with mysterious names like Pluto or Genie or worse still impenetrable 3-letter acronyms (Steve Jones speaks to this in his book Enterprise SOA Adoption Strategies — chapter 12). How do the business and 3rd parties make sense of these meaningless and cryptic monikers? Usually they don’t. They only serve to isolate IT.

With this in mind, IT and the rest of the business have a daunting task of breaking down the organisation, its data and functions into services. One plan of approach to this is to look for things that are highly cohesive, that is, things that naturally belong together because of their very nature.

However deciding what belongs together can be like the proverbial pulling of a thread from a sweater, you pick one thread to pull it out and the rest comes along with it. You end up with long chain dependencies; everything directly or transitively refers to everything else. It’s a similar problem that ORM toolkits have, but this isn’t just about data.

Every data entity, every small bit of function or process in your organisation is related to another somehow, either directly or transitively. It’s 6 degrees of separation applied to your software estate and there’s no getting away from that fact. The extremely hard question is where does one service stop and another begin, where do I draw the lines between services.

I used the term “cohesion” earlier, its counterpart, its nemesis is coupling. Coupling is where something has been put together or joined with something it doesn’t strongly belong with. To using a banking example you don’t expect your staff payroll system to need modification when you change the way deposits to customer accounts work.

In short good dependencies represent high cohesion and bad dependencies represent tight coupling. The opposites of these are low cohesions (which is bad, boo) and loose coupling (which is good, yay).

The question remains, though, why is cohesion good and coupling bad. The advantages of cohesion are as follows:

  • Your brain groups naturally cohesive concepts, things that are like each other follow naturally. Cognitively working on related concepts at the same time makes sense.
  • Changes to one service are less likely to require modifications or have side effects on other services.
  • Services need to interact with each other much less, because for the majority of cases the functionality or behaviour of the services belongs in the service. This means that invocations can and data access can occur within the process space of the service, not need for network calls and data marshalling.
  • It makes it easier to reason about where functionality or data might exist in your services. For example if I need to change how salaries are calculated, that’ll be in the payroll service, it becomes obvious.

The disadvantages of coupling, somewhat the corollary of the above, are:

  • The service is harder to understand because you have to hold more concepts than necessary in your head when reasoning about the service.
  • Unexpected consequences, you change one piece of functionality and an unrelated one breaks.
  • Can lead to fragmentation of cohesive functions and therefore higher communication overhead.
  • You have to make changes to more code than necessary when adding functionality.

Types of Cohesion and Coupling

There is much written on this and so I’ll try not to rehash it too much, unfortunately I haven’t found anything that unifies well cohesion and coupling. What I’m going to do is referred to good types as cohesion and bad types a coupling.

Data cohesion (good)

Where data is often used together.

Example: a dating website would put a customer name and email address together because they are used together often. However they would not put suitable partners together will their credit card details

Functional cohesion (good)

Where functions are related and act upon the same data

Example: registering for a website and modifying your username might be two functions on the same service, whereas paying an employee’s salary wouldn’t belong there because functionally it makes no sense.

Categorisation Coupling (bad)

Things are put in the same service because they belong to a certain category of data or function.

Example: Used cars have a location and so do used car sales men so I’ll create a location service for them bothered.

Process Coupling (bad)

This is where services are created around long chain business processes. The problem with this is that business processes tend to go across many concepts within an organisation and so pull a lot of stuff with them — forcing you toward a monolith.

Example: A company has a process for the selection, purchase and installation of equipment. This process includes requirements for the equipment, knowing how to contact suppliers, how to receive invoices, how to make payments, engage with legal, specification of their property portfolio and so forth, before you know it you’ve got a monolith.

Arbitrary Coupling (bad)

This is where unrelated concepts exist in the same service. Who knows why, perhaps the systems designers had two projects and couldn’t be bothered to have two separate modules in their IDE.

Example: most enterprise off-the-self business software, although things are slowly getting better.

Data Type Coupling (bad)

This is where services use the same definition of a type and when one needs to modify that definition it breaks the other.

Example: A company has a single definition of its customer type, the properties of that type are defined (think global XSD for customer data). Each service that deals with customer has to be capable of understanding this customer type. One day marketing decide they want to add twitter username as a new field. This means that all services now need to be updated to include this field when talking to the marketing service as it’s now fully expecting it.

Temporal Coupling (bad)

When two concepts happen at the same time but otherwise they are unrelated. This is similar to process coupling and is often a symptom of that.

Example: Every month accounting complete their books and makes sure they balance, they also run payroll, the account service is created to do both these things.

Andy Hedges
[comment]