Andy Hedges' Blog

4 Symptoms of Dysfunction in Software Teams

Once you’ve been in the industry for a decade or so you start to get a sixth sense of when things aren’t right. However even when your sixth sense isn’t working here are some signals that should raise alarm bells.

1. It Dependencies

Every time you ask someone how something works or where some data is the explanation starts with ‘it depends’.

For example say I enquire how a user’s password is verified, I’d get an answer something like this:

“It depends, if the user registered before 2001 then you need to call the genesis logon system, after 2001 however we switched to active directory, unless the user was a customer of that company we bought in 2005, in which case you’ll need to call the ‘migration’ LDAP they put in place at the time which uses Netscape LDAP so you’ll need JNDI library we patched to work around some bugs. Also if the user is an administrator we keep the credentials in the staff database so you’ll need to do an ODBC look up for that.”

You get the idea, technical debt has built up, poor decisions haven’t been rectified and now there is there a laundry list of ‘it depends’ for every question.

2. Jane Doe

Here’s where we have a systems or component that only one person understands, the cynical might refer to this a mortgage driven development however it often happens because of apathy too. Unfortunately, the system might not be considered sexy to work on and therefore hasn’t been touched other than to keep it ticking over. The trouble is that every time there is a problem only Jane can fix it and she’s not much into do that work either. She’ll do the minimum to keep it ticking over.

The trouble with this is that the system is providing value to the organisation otherwise it could be turned off. It’s very likely if it’s providing value that at some point in the future it will need enhancing and nobody is going to be able to do and that — and then Jane quits.

3. Town Hero

This is the support equivalent of ‘Ask Doe’, it isn’t a case of if this system will fail but when and how dramatically… and it’s always dramatic. When it does go wrong there is only one plucky guy in support who can fix it: our local hero. Undoubtedly he’s not around when things go wrong but everyone knows he’s the only guy that can fix it.

When the inevitable catastrophic failure occurs calls are made to his landline, his mobile and DMs to his twitter account. He’s unreachable, what should we do? We’re doomed. When all seems lost a call is received. He’s been located, but he’s on a beach in Rio. There’s no way for him to connect to take a look. However he’s had an idea he’s VPNing into a server in Hamburg to create a SSH tunnel through the San Francisco DC and from there to the corporate firewall… he’s in! A few minutes later he’s diagnosed the potential problem. People are called to mission control, orders are given, procedures are ignored, changes are made. The hero calls into the bridge and explains the problem, it’s going to be tough and take some time but he can do it — nobody else has a clue what he’s talking about. In the small hours the system is restored, everyone pats each other on the back for pulling together and working as a team — also, more importantly, thank goodness for the hero, what would we do without him?

All is well in the world thanks to that guy, our hero — until next month, when the exact same thing happens again.

The trouble is these systems never get fixed, either because the team that that supports them aren’t qualified to do so, aren’t allowed to do so or because they are too busy, maybe too busy being heroes.

4. Carcinogenic Prototype

This system starts with a great idea, an idea that has a lot of potential. In order to qualify that potential a demo/pilot/prototype is created and as it turns out the potential is realised. The business become very keen to get this prototype into full production — they might just make those targets they thought they were going to miss if this works out. The engineers who built the prototype are pleased because their work is demonstrating value but reticent because it was just a prototype.

The engineers mention this to the group. The system wasn’t designed to go live, it doesn’t scale easily, it doesn’t have good exception handling, it doesn’t do any logging, it can’t be monitored, the list goes on. The engineers are overruled but promised to be given time to fix it up later.

Later never comes because once the system is live feature enhancements come pouring in, the system grows rapidly and sometimes starts to affect other systems too. Unfortunately due to the quality of the system it’s very hard to add features in a clean way and so the technical debt grows, it compounds.

Solution

As with most things in software engineering the technical problems are symptoms of the organisational causes. For each of the symptoms listed above there are numerous organisational reasons for them occurring, I’m going to start to blog each of these organisational issues over time. However at the heart of the problem is a lack of desire to change the status quo, to enforce a level of quality in the engineering and take ownership. Take ownership of your minimum viable quality and stick to it.

Andy Hedges
[comment]

SOA vs Microservices

The microservice term has been around for a while but for as long as it has been around there has been much arm waving around what it was. It continued as one of those nebulous concepts that meant anything and everything to everyone and no one. In this way it caused a little confusion but served as a token to reference something — but what? The closest you could get was was that the services were smaller than monolithic applications, by some measure that was vague. Of course, as is often the case some people were tacking fairly arbitrary stuff onto the term too. That’s where we were, microservices were smaller than just about the biggest thing anyone can ever create in software: monolithic applications. Monolithic applications are like the Graham Numbers of software architecture.

“Graham has recently established … a bound so vast that it holds the record for the largest number ever used in a serious mathematical proof.” — Martin Gardner (c.1977)

This wasn’t particularly useful but pretty much par for the course in the IT industry, buzzwords come and go and often don’t have concrete meaning attached to them, or have multiple conflicting meanings attached to them. However this is where things took an interesting turn, Martin Fowler, a well known name in the industry made an attempt to document what the terms could mean. Fowler takes a very thorough approach to documenting things and has a rather large following of people, so when he does document something the term:

  • tends to stick
  • have concrete meaning

The trouble was that a number of people to a great or lesser extent, including myself, believed the definition of microservices that Fowler came up with is remarkably similar to SOA as known and documented, with some other practises thrown in. Those other practises were debatably a little orthogonal to the topic at hand, that is, they were good practise but beside the point.

It’s well documented what SOA is and now it is well documented what microservices are. Compare them side by side and most reasonable people, I think, will come to roughly the same conclusion. So how did this happen?

Marketing and communication

SOA is unfortunately a hard concept to grasp initially, it takes a little time, then most people, if they persever have an epiphany and wonder what it was that they didn’t understand in the first place.

Trouble is I think that only those who really went to town on SOA the first time around had that epiphany, but why did some go to town on it and others not.

What happened with SOA is that the waters were muddied with a large amount of esoteric and/or complex language, take a look at the SOA Reference Model for example, it has some very good information in there but it is an intimidating read and I can imagine that not many have read it. I would absolutely forgive anyone for not reading it, it took me about 5 quite determined attempts…

If you contrast this document to Fowler’s blog post then you can see which is more easily digestable. Fowler is (literally) an expert in the written communication of technical concepts, perhaps the best .

Vendor spin

As anyone who’s been in the IT industry for any amount of time will know for any given buzzword there are roughly a billion vendors trying to capitalise on it. This may be as simple as saying it is SOA enabled or microservice compatible but it can take more subtle forms too.

One of these approaches is to create a related standard or pattern. It wasn’t long after SOA started getting discussed as a concept that WS-* appeared on the scene with a seamingly endless parade of complex specifications and reference implementations of said specification. Each specification was dutifully implemented by the big vendors so that they could become ‘service-enabled’.

Unfortunately these specifications were awful, some absurd, moreover these vendor implementations didn’t work very well with each other, indeed some of them needed standards to defined the standards. That is a web-service exposed by one vendor couldn’t be reliable called by a client from another vendor. This left a bad taste for the right thinking engineer and there was violent push to the opposite end of the interface spectrum: ReST, JSON and so forth but that’s another story and one we all learned from.

As I eluded to early it wasn’t just specifications that clouded the SOA picture. In order to sell software licenses it was key to have something big, expensive and most importantly critical to the client once implemented. There is no greater example of this than the ESB. However I’ve written about why you don’t need an ESB before, needless to say this also left a bad taste.

For the record, to my mind, neither ESB or WS-* are desireable in a modern SOA. Unfortunately, sometimes WS-* is the only way to communicate with legacy applications. The industry did learn from all this, I hope.

Time fades

Finally, time fades, engineers like new things, I know I do. I like to play with new technology, think about and discuss new ideas. I’m also forgetful, I forget some of the cool things I once knew. As the expression goes, everything old is new.

Summary

For the most part there isn’t really anything new in microservices over SOA. SOA has some baggage of ESB and WS-*, that is some people confuse that with SOA, unfortunately vendors did that. For the record some of us gave them a hard time for that at the time.

Do we need a new term? I don’t think so, I think it would be more useful to tie down what we mean by SOA, clearly remove the bad things and highlight new thinking. However by renaming I think we revise history, we lose the discussion, the ‘changelog’.

My humble suggestion would be that it is more useful to highlight the difference, what’s new and give those things names, think of these changes as microbuzzwords.

Andy Hedges
[comment]

Service Decomposition, Cohesion & Coupling

Service Oriented Architecture is about making IT look like your organisation — your business. In many companies IT systems are broken down in to lumps that aren’t the same lumps that the business understands. You may have systems with mysterious names like Pluto or Genie or worse still impenetrable 3-letter acronyms (Steve Jones speaks to this in his book Enterprise SOA Adoption Strategies — chapter 12). How do the business and 3rd parties make sense of these meaningless and cryptic monikers? Usually they don’t. They only serve to isolate IT.

With this in mind, IT and the rest of the business have a daunting task of breaking down the organisation, its data and functions into services. One plan of approach to this is to look for things that are highly cohesive, that is, things that naturally belong together because of their very nature.

However deciding what belongs together can be like the proverbial pulling of a thread from a sweater, you pick one thread to pull it out and the rest comes along with it. You end up with long chain dependencies; everything directly or transitively refers to everything else. It’s a similar problem that ORM toolkits have, but this isn’t just about data.

Every data entity, every small bit of function or process in your organisation is related to another somehow, either directly or transitively. It’s 6 degrees of separation applied to your software estate and there’s no getting away from that fact. The extremely hard question is where does one service stop and another begin, where do I draw the lines between services.

I used the term “cohesion” earlier, its counterpart, its nemesis is coupling. Coupling is where something has been put together or joined with something it doesn’t strongly belong with. To using a banking example you don’t expect your staff payroll system to need modification when you change the way deposits to customer accounts work.

In short good dependencies represent high cohesion and bad dependencies represent tight coupling. The opposites of these are low cohesions (which is bad, boo) and loose coupling (which is good, yay).

The question remains, though, why is cohesion good and coupling bad. The advantages of cohesion are as follows:

  • Your brain groups naturally cohesive concepts, things that are like each other follow naturally. Cognitively working on related concepts at the same time makes sense.
  • Changes to one service are less likely to require modifications or have side effects on other services.
  • Services need to interact with each other much less, because for the majority of cases the functionality or behaviour of the services belongs in the service. This means that invocations can and data access can occur within the process space of the service, not need for network calls and data marshalling.
  • It makes it easier to reason about where functionality or data might exist in your services. For example if I need to change how salaries are calculated, that’ll be in the payroll service, it becomes obvious.

The disadvantages of coupling, somewhat the corollary of the above, are:

  • The service is harder to understand because you have to hold more concepts than necessary in your head when reasoning about the service.
  • Unexpected consequences, you change one piece of functionality and an unrelated one breaks.
  • Can lead to fragmentation of cohesive functions and therefore higher communication overhead.
  • You have to make changes to more code than necessary when adding functionality.

Types of Cohesion and Coupling

There is much written on this and so I’ll try not to rehash it too much, unfortunately I haven’t found anything that unifies well cohesion and coupling. What I’m going to do is referred to good types as cohesion and bad types a coupling.

Data cohesion (good)

Where data is often used together.

Example: a dating website would put a customer name and email address together because they are used together often. However they would not put suitable partners together will their credit card details

Functional cohesion (good)

Where functions are related and act upon the same data

Example: registering for a website and modifying your username might be two functions on the same service, whereas paying an employee’s salary wouldn’t belong there because functionally it makes no sense.

Categorisation Coupling (bad)

Things are put in the same service because they belong to a certain category of data or function.

Example: Used cars have a location and so do used car sales men so I’ll create a location service for them bothered.

Process Coupling (bad)

This is where services are created around long chain business processes. The problem with this is that business processes tend to go across many concepts within an organisation and so pull a lot of stuff with them — forcing you toward a monolith.

Example: A company has a process for the selection, purchase and installation of equipment. This process includes requirements for the equipment, knowing how to contact suppliers, how to receive invoices, how to make payments, engage with legal, specification of their property portfolio and so forth, before you know it you’ve got a monolith.

Arbitrary Coupling (bad)

This is where unrelated concepts exist in the same service. Who knows why, perhaps the systems designers had two projects and couldn’t be bothered to have two separate modules in their IDE.

Example: most enterprise off-the-self business software, although things are slowly getting better.

Data Type Coupling (bad)

This is where services use the same definition of a type and when one needs to modify that definition it breaks the other.

Example: A company has a single definition of its customer type, the properties of that type are defined (think global XSD for customer data). Each service that deals with customer has to be capable of understanding this customer type. One day marketing decide they want to add twitter username as a new field. This means that all services now need to be updated to include this field when talking to the marketing service as it’s now fully expecting it.

Temporal Coupling (bad)

When two concepts happen at the same time but otherwise they are unrelated. This is similar to process coupling and is often a symptom of that.

Example: Every month accounting complete their books and makes sure they balance, they also run payroll, the account service is created to do both these things.

Andy Hedges
[comment]

Huffman Coding

I do a lot of programming in the large and spend far too much time, according to some, thinking about programming in the large concepts such as SOA and CEP but from time to time I like to keep it real and code something detailed and low(er) level. Recently I’ve been doing a little investigation around compression and so I ended up reading up on Huffman Coding. Huffman coding is a way of taking a fixed length encoding and turning it into a shorter variable length encoding.

Back to basics

That’s a little vague and academic so I’m going to assume little knowledge of how computers store data and work my way up and see if I can explain it properly.

On off switch
Figure 1. On off switch

Computer storage is made of switches, lots and lots of on/off switches, a switch in storage terms is called a bit and is the smallest possible unit of storage. Switches can be on or off and computers represent that on or off as 1 or 0 respectively. That’s why the classic power switch on most devices is a zero with a one overlayed as in figure 1, 0 and 1 mean off and on to engineers. If I only have 0s and 1s then I’m going to have to find a way of representing numbers bigger than 1 using them.

Getting to second base

Classically we use a numbering system with ten digits zero through to nine, which mathematicians would call base 10 or decimal. However as computers only have two digits they use a counting system base 2 which is referred to as binary. To count in binary we simply increment the number on the right until it reaches it digit limit (1) and then increment the number closest to it on the left whilst reseting that number to zero, if the number on the left is also at its limit we move again to the left and reset it until we find one we can increment, this is exactly the algorithm we use for normal counting. Therefore we have

binary  decimal
     0        0
     1        1
    10        2
    11        3
   100        4
   101        5
   110        6
   111        7
  1000        8
  1001        9
  1010       10
  1011       11
  1100       12 

Interestingly and logically where in decimal the first digit represents units and the second tens and the third hundreds, in decimals they represent 1, 2, 4, 8 and double for each digit to the left. Thus 101 (binary) is

(1 × 1) + (2 × 0) + (4 × 1) = 5 (decimal)

Anyway enough about bases, I think we have that covered off. What we need to understand is you can represent numbers using sequences of zeros and one on computers.

Encoding

Computers encode many things to files such as text, images, video, CAD models and many more. Let’s choose the simplest thing possible, though, for our example: text. To have text file we must have some way of mapping numbers to/from letters, what computer scientists call character encodings. Whilst conceptually simply these things stike fear into programmers who are smart enough to understand what they are and what a mess we’ve created for ourselves. I’m going to dodge the bullet of fully explain why they are a pain and use the simplest one as an example. US-ASCII, which is the seemly tautological United States-American Standard Code for Information Interchange, as an example character encoding.

US-ASCII defines a mapping of numbers to characters. Each character is represented by an 8 binary digit number known as a byte, remember a binary digit is called a bit, so a byte is an 8-bit number. The reason numbers are stored as 8-bits is so that they can be read off storage easily. Remember there are only zeros and ones on computers. Therefore to know where a number starts and ends there has to be some way of splitting them back apart. A file contains for example:

0100111101001011

Which is two bytes of data namely 01001111 and 01001011 or in decimal 79 and 75 or in US-ASCII encoding “OK”.

ASCII defines which numbers represent which characters, for example ‘A’ is 41 (decimal) that is 01000001 (binary). As it just uses 8 bit numbers there are only 256 characters possible (i.e. 00000000 (0) through to 11111111 (255)). Full mappings of numbers to characters for ASCII are availble all over the web.

Let’s split

You might be thinking why bother with the leading 0s on each byte in the “OK” example above, well if I did that I would have

10011111001011

and how on earth do I know which digits belong to which number, it could be 10011111, 001011 or it could be 100111, 11010101 indeed there are 13 possibilities assuming I know there are two numbers represented, otherwise it could be 14 numbers or 1, basically I’m screwed if I don’t know the length of the numbers in bits. I can’t add anything to separate them because all I have is zeros and ones and I can’t use a zero or one to separate them because I won’t have anything left to count with; you can’t count in base 1. Therefore in order to put numbers in storage I need to predetermine a length for those numbers in bits and stick to it. The general building blocks of information in computer science is 8-bit bytes and that is why.

To summarize, we have numbers represented by 8-bit bytes and we have a mapping of those numbers to characters assuming we are using US-ASCII. However forcing all numbers to by 8 bits seems awfully wasteful. Let’s suppose I want to store the sequence of 20 characters:

ababababab

Well in this case I only need two numbers to do so so I could just store them using a basic character binary encoding of 0=a and 1=b. Therefore rather than using 8 x 20 = 160 bits of storage I could just use 20 bits. To spell it out I could use

01010101010101010101

Rather than the US-ASCII equivalent:

1000110110001110
1000110110001110
1000110110001110
1000110110001110
1000110110001110

This is huge storage saving in percentage terms. It’s important to point out that an encoding maps one set of symbols to another, in the case of ASCII it is 8-bit bytes to common western characters, for Huffman it is variable length binary sequences to 8-bit bytes. Therefore what’s actually happening here is:

a (ASCII) -> 10001101 (8-bit byte) -> 0 (Huffman)

Now what if I have 3 characters in my sequence:

ababababac

This presents a problem for the encoding technique above, because I have no obvious way to encode the ‘c’. If I use the next number in the binary sequence namely 10 for c. I have a problem that a sequence such as cba:

1010

has ambigious meaning, it could be abc, cc, bac or cba, not much use. However, and here’s the key insight, suppose I only use zero/one sequences that don’t appears as the beginning of any other sequence to represent 8-bit numbers. Therefore I pick 0=a and because I can’t use a number starting with 0 after picking 0=a I must use something like 10=b leaving 11=c. I can therefore unambigiously encode cba as:

11100

There is nothing else it can be but cba. I don’t need to know the bit length of each number to decode. To work the example the first digit, 1, this isn’t a character, nothing is encoded as 1 so I add the next bit to it and get 11, well that’s c there is no other number starting with 11 so I can unambigiously decode it. Similarly with 10 and 0.

The algorithm

The aim of Huffman Coding is to create a shorter encoding that the original fixed width encoding. Indeed it is a little more than that, it is to get the shortest possible encoding unambigious encoding. How do we find this out? Well first a few things about Huffman Coding. The genius of the algorithm is that it is simple and will always find the optimum (shortest possible) encoding and this encoding will always be less than or equal to the length of the equivalent 8-bit encoding.

Tree hugging

Firstly I describe the algorithm then I’ll do any example. As we know US-ASCII is simply a sequence of 8-bit bytes as is every other file on your computer (on the vast majority of modern computers). Therefore we have a sequence of bytes.

Huffman encodings use trees, Huffman trees, to describe their encoding. A Huffman tree is a binary tree, in that each branch gives way to 2 or fewer branches.

So the algorithm:

  1. Count the number of occurences of each byte in the sequence and put them in a list
  2. Sort that list in ascending order of freqency
  3. Pick the two lowest frequency bytes off the top of the table and add them to the tree as two branches on the trunk
  4. Add the frequencies of those two nodes together and add that part of the tree back to the list and sort the list again in ascending order of frequencies
  5. If there is more than one item left in the list then go to step 3, otherwise you are done, the last item in the list is the completed tree

Example

In this example we are going to encode the string: “Mississippi hippies”.

Here’s the frequency table:

[_,1]
[M,1]
[e,1]
[h,1]
[p,4]
[s,5]
[i,6]

Note I’ve substituded [space] for the underscore character “_”.

So we take the two smallest values off the top and create our first part of the tree, hopefully the notation is self explanitory the tilda (~) means that it is just a node and the letter before is simply and identifier, the leafs have the character they represent and the frequencies:

  z[~,2]
    / \
   /   \
[_,1] [M,1]

We add this back to the list

 [e,1]
 [h,1]
z[~,2]
 [p,4]
 [s,5]
 [i,6]

Then the next two lowest frequency items

  y[~,2]
    / \
   /   \
[e,1] [h,1]

Adding it back in gives

y[~,2]
z[~,2]
 [p,4]
 [s,5]
 [i,6]

Then the next two (which are both sub-trees) gives

        x[~,4]
          / \
       ---   ---
      /         \
   y[~,2]     z[~,2]
    / \         / \
   /   \       /   \
[e,1] [h,1] [_,1] [M,1]

Adding it back in gives:

~~~ x[~,4] [p,4] [s,5] [i,6] ~~~

Next two:

           w[~,8]
             / \
            /   \
         x[~,4][p,4]
          / \
       ---   ---
      /         \
   y[~,2]     z[~,2]
    / \         / \
   /   \       /   \
[e,1] [h,1] [_,1] [M,1]

Back to the list:

  [s,5]
  [i,6]
 w[~,8]

Almost there:

  v[~,11]
    / \
   /   \
[s,5] [i,6]

Back:

w[~,8]
v[~,11]

And the final tree:

                   u[~,19]
                   / \
                ---   ---
               /         \
            w[~,8]     v[~,11]
             / \         / \
            /   \       /   \
         x[~,4][p,4] [s,5] [i,6]
          / \
       ---   ---
      /         \
   y[~,2]     z[~,2]
    / \         / \
   /   \       /   \
[e,1] [h,1] [_,1] [M,1]

There we have it, our final Huffman tree. How do we use it? Well in order to find the encoding for each letter we travel down the tree until we get to it. When we go left we add a 0 when we go right we add 1. For example to encode ‘e’ which is a rare character in our input string we get the following:

0000

Or for ‘i’ which is very common in the input sequence we get

11

a much sorter encoding. From this tree we can build an encoding table as follows:

_  0010
M  0011
e  0000
h  0001
p  01
s  10
i  11

Thus we can encode the orginal string “Mississippi hippies” into:

00111110
10111010
11010111
00100001
11010111
000010

Which is 46 bits rather than the (19 x 8) 152 of the orginal. We’ve compressed 19 bytes into 5 bytes (last byte is zero padded).

A couple of notes:

  1. For input with almost every byte possibility and roughly equal frequency of those bytes compression will be very limited — large movie files for instance.
  2. For some encodings certain bytes will encode to longer than 8-bit codes, but the shorter encodings will at least offset those
  3. Huffman tree can be stored very effiently using a fix length format, pick the left then the right, move down to the left and repeat until the tree is traverse. When you hit nodes in the tree that are leafs output 1 followed by the value (not the frequency), when they are nodes that are simple parents of others output 0 followed by nothing.

This was a pretty fiddly blog post, please let me know if I’ve made mistakes in the comments or email.

That’s it, I think.

Andy Hedges
[comment]

Events And Service Oriented Architecture (SOA)

When people start to service orient their organisation they often focus on exposing APIs and those APIs invariably solely or mostly focus on method calls, what I and others often refer to as RPC. This is great and brings huge benefit but it does miss a huge opportunity and that is being able to observe and react to what’s happening in your organisation.

In order to be able to observe and therefore react to whats happening in the services that make up your organisation you need to add events to your services. What do I mean by events? To start with let’s leave technology aside and think of the business problem you might be trying to solve. As an example let’s take a retail bank that offers current (checking) accounts. To model this account appropriately there are things that should be modelled as RPC and things that should be modelled as events. If a customer uses an ATM to check their balance this should be RPC, the ATM will call the account service to get the current balance to display it to the customer. There is little point in doing this as an business event because you need the output of the customer asking for their balance to continue.

Now suppose the customer wants to make a withdrawal, this would cause an RPC type invocation (i.e debit £100) and an event (i.e. user withdrawal occured on account id 5123). The RPC call allows us to perform a blocking operation to check there is sufficient funds, make the deduction and inform the ATM that it can dispense, the event will be published for interested parties to be informed that something they might be interested in has happened. Who might be interested in this event? Well it could be an analytics package that wants to keep track of which ATMs are popular or maybe a complex anti-fraud system figuring out suspicious patterns of withdrawals.

The great thing about events is that the systems raising them doesn’t need to understand how they are used they can simply raise them and go about their business. In the example above suppose you were asked to add a feature where customers could have details of their withdrawals emailed to them. Rather than go in and change the mission critical code around financial transactions you could set up a service that listens to these events and when it sees one email the customer. The team that looks after the account service need not even know.

Events start to get really interesting when you combine them, what some folks call this Complex Event Processing (CEP) but I prefer to consider a fairly logical part of of Event Driven SOA.The ‘complex’ in CEP refers to the fact that multiple events are combined to infer or derive something more interesting has happened. This is all a bit theoretical so let’s revisit the anti-fraud example from earlier. A security analyst has identified that when a customer makes withdrawals from ATMs in two different counties within the space of a day then this is suspicious but not impossible, it might raise an event such as “Customer Crossed Border” event. If the customer goes on to make high value transactions in the first country then the matter needs investigating as another “Customer Crossed Border” event occured on the same day as the first. The fact that this has happened would raise an “Suspicious Occurance” event which the account system listens to and locks the account. When the account is locked an “Account Locked” event is raised with a reason code; the customer support centre service listens on this event. When one is received a task is added to one of the call centre operatives list of work (i.e. call customer and verify transactions) and so on.

Please don’t get me wrong you shouldn’t expose all data and behaviour solely via events, this would be ridiculous but I’ve seen horror stories of people doing so, indeed I’ve worked to put them right. What I’m advocating is using a balance of RPCand events to best represent your organisation in an SOA fashion.

I’ll talk more in future blog posts about patterns for adding and utilising events in your SOA.

Andy Hedges
[comment]

Why you don't need an Enterprise Service Bus (ESB)

ESBs irk me, not the technology in and of itself, that can be useful, it’s the way they used. Mostly because every architect and his mentor seems to think you can’t have an architecture without one. I swear sometimes they must have the ESB icon pre-painted on their whiteboards because, splat, there it is, a hulking great rectangle in the middle of every systems diagram. They aren’t always called ESB, sometimes they sneak through as ‘Orchestration’ or ‘Integration Hub’ or a vendor product name.

I first came across the term ESB about 10 years ago, a colleague mentioned them to me and we discussed the concept over lunch, by the end of the lunch break we’d come to the conclusion they weren’t necessary or indeed desirable in an SOA. I’ll put forward some of the classic reasons architects give for using ESB and explain a better way to achieve each desired outcome or in some cases why that outcome isn’t desirable.

Before I go on I should explain the difference between the central ESB (bad) and using ESB technology in a more appropriate manner.

The big central ESB
Figure 1. An ESB used in an inappropriate way

Figure 1. shows an ESB through which all services communicate, they are essentially unaware of each other and may be in the dark over what each others’ interfaces are. Now in the worst case the central ESB grows a team to support it. This is, after all, the obvious things to do, every service in the organisation wants changes all the time to what they use/expose from/to each other and therefore in order to prevent everyone hacking away at the ESB and general chaos the ESB team is created. All requests for change to the ESB go to the ESB team.

The ESB team are now furiously busy and considered heros for helping everyone communicate and are the first port of call from each service team to get what they need from the rest of the organisation. Soon they are getting requests for new functionality, however they don’t know how to change the services and perhaps it’s a little tricky to work out which service should handle that functionality so they slip a little bit of business logic into the ESB and call it orchestration, after all orchestration is what ESBs are for. This continues for a number of years until it dawns on everyone that the ESB contains most, or at least significant portions of the business logic and the services have become no more than CRUD layers over databases.

ESB technology used sensible way
Figure 2. ESB technology used a sensible way

Figure 2. is ESB technology used in a more appropriate way, of course it is no longer a true ESB, because it isn’t “Enterprise”, it isn’t just one service bus for the entire enterprise. Each service uses ESB technology within itself to ensure their interface can be stable, they can use it to maintain multiple versions of the interfaces simultaneously but that is within the service. The service team has complete autonomy over what happens with their service now. In many cases they don’t need expensive software to do this, they can simply code it in the programming language of choice or use a simple library to achieve things like mediation, routing, location, security etc. Once each service has this capability the requirement for a central ESB team falls away entirely. As the organisation has been sensible and assigned teams for each service then those team speak to each other directly to get new functionality. The act of the teams talking directly to each other, face to face, person to person about their requirements also improves overall understanding of the organisations software assets.

Below I address some of the common reasons people give for using central ESBs and suggest more appropriate patterns for achieving the desired outcome.

The ESB Protects Me From Change

The argument here is that if you want Service A to call Service B you’d better go through the ESB in case Service B’s interface changes.

What should happen is that Service B publishes an interface and guarantees it won’t change for a period of time (say 12 months), if changes are required another version of just the interface is created and the old version is mediated on to that new version.

That’s quite a bit to take on so I’ll give a simple example. A company builds a service with an operation that exposes the price of various commodities, it returns a MoneyValue response:

MoneyValue getPriceOfGold()

This works perfectly until functional requirement gets added to return the price of silver too. The architects, being smart, realise this is probably not the last metal they will be asked to add and so they create a method that’s more flexible and looks like

MoneyValue getPriceOfMetal(Metal metal)

However there are several consumers of the old getPriceOfGold method. In order to support these clients they leave the old service interface in place but redirect calls from getPriceOfGold to getPriceOfMetal(gold)within the service and everyone is happy. Eventually they will ask consumers of getPriceOfGold() upgrade to getPriceOfMetal(Metal metal).

Therefore you don’t need a central ESB to achieve this objective, you don’t need anything outside the service but a bit of mediation logic in your service.

Extra Level of Protection

Just to quickly address the oft retort to the above that a central ESB gives another layer of protection, yes it does, but it comes with all the draw backs of the central ESB: adding logic in the wrong places, functional enchancement bottleneck and so forth.

The ESB Allows Me To Orchestrate Services

Orchestration is either business logic, which belongs in the service, this is the point of services after all, they contain your logic and data, or if it is simply a case of moving a human through some process or other then that belongs in the UI code (e.g. register for a website and then add something to your basket).

The ESB Can Mediate My Data

Yes it can but see above section ‘The ESB Protects Me From Change’, the mediation can be achieve much more sensibly in your service.

The ESB Can Locate My Services

Each service should have a mechanism for locating any other service for the purposes of RPC calls. I prefer to use DNS in most cases (e.g. metal-exchange.example.com would resolve to the metal exchange service interface or API), DNS has huge power but can be used very simply too. Bonjour (aka Zeroconf) is a often cited as a solution too and it’s a good answer but it merely some extentions to DNS at the end of the day. Others suggest things like UDDI but I have never found the need myself.

The ESB Can Do My Routing

For services that subscribe to events from other service, the same process can be used for locating those topics/queues, DNS to find the broker and well named and documented topics/queues on those brokers. Most message brokers provide means of routing messages from location to location sensibly, if yours doesn’t, get another.

The ESB Can Monitor My Services

Your services should provide monitoring information over any number of technologies that can be monitored by any number of technologies. Examples of technologies that can enable your services to be monitored are syslog, JMX, SNMP, Windows Events and simple log files — one or more of these are available in just about every language. Examples of technologies that can monitor those technologies are Nagios, OpenNMS and any number of commerical systems. You don’t need an ESB to do this.

The ESB Provides Extra Security

No they don’t, they remove security by terminating it prematurely. They open you to either deliberate or mistaken man in the middle attacks.

Andy Hedges
[comment]

The Simplest Blog That Might Work

I’m aiming for a simple, fast and minimalist blog. A blog where writing posts is all I have to think about, not themes, fancy backgrounds, AJAX, hosting services, cloud APIs and well one starts to lose the will to live.

Raspberry Pi in a takeway container on top of random network equipment
Figure 1. Raspberry Pi in a takeway container

The design goals are:

  • simple
  • cheap
  • fast

I’m quite pleased with the way it works, it’s probably not for technophobes but then nor is blogging. In order to publish a post you create a new text file with your post in it in a simple directory structure, if you have any images, video etc then you drop them in a “resources” folder. If you’d like some formatting in your post you can use markdown. After that, run a simple script and everything is taken care of: it’s published to the web. So far my limited posts look like this in the directory structure:

$ tree --charset US-ASCII posts/
posts/
|-- 20120216
|   |-- article.yaml
|   `-- resources
|       |-- First-Project-Syndrome-Figure1.png
|       `-- First-Project-Syndrome-Figure2.png
`-- 20131230
    `-- article.yaml

3 directories, 4 files

How it works

The markdown from the YAML file is converted to HTML, which is minified (optimised to remove redundant spaces etc), and then put into a folder which btsync is watching, once a change is noticed it is synced to all computers with btsync installed on, including, most importantly the web server.

A Note On DNS

As I’m hosting this on my home broadband connection, which doesn’t have a static IP address, I needed a way to update my DNS record quickly every time my IP address changed. To do this I used a free service DNSdynamic which gives you a subdomain on one of their domains (e.g. example.dnsdynamic.com), I chose andyhedges.http01.com but anything would work, you then install a client which regularly checks your IP address and updates the DNS if need be. This is great but I wanted to use my vanity domain name hedges.net, I therefore configured a CNAME with my DNS registrar to point blog.hedges.net to andyhedges.http01.com and I was in business, DNS-wise at least.

Costs

The cost of my blogging platform breaks down as below:

  • software - £0 (all open source or freeware)
  • hosting - £0 (using my home fibre connection)
  • hardware
    • Raspberry Pi - £28.99
    • Power Supply - £0 (free with phone)
    • Network Cable - ~£1
    • SD Card - ~£4
  • DNS - £0 (using my DNS registrar and Dynamic DNS provider)

There we have it, a blogging platform for thirty four quid that doesn’t require 3rd party hosting. It remains to be seen if my ISP gets cross.

Benchmarks

On the little R’berry Pi a quick benchmark with no optimisation shows it can handle 250 requests per second with a response time of 3ms (across my home gigabit network).

Todo

There are still a few more to-dos:

  • preview mode
  • comments (I think I’ll use disqus or maybe G+)
  • RSS
  • optimising Nginx (e.g. gzip or sdch, SPDY, threads and so forth)
  • smartypants-like substitution
  • Open Source it (tidy code, add licenses put it on github)
  • Set up a 301 Moved Permanently on a virtual host for the Dynamic DNS name

Full details

For those interested the full details of software, hardware and network.

The tools I’ve chosen for the client side are are:

  • Notepad (although sublime, textpad, gedit or vi would do), this is for editing the posts
  • btsync this enables me to keep the webservers and any computer I use in sync with all of the generate content, the source information and templates, more on that later

The development tools:

The software libraries:

  • SnakeYaml this is a YAML binding for Java, YAML is basically a human friendly information format, similar to XML or JSON but unlike those two easy to read and write for us humans.
  • FreeMarker is a templating language similar to JSP or Razor and provides an easy library to integrate it with your projects
  • Actuarius a markdown to HTML converter
  • htmlcompressor an HTML minification library
  • YUI Compressor a CSS minification library

The server side software:

  • Nginx a nice, light, fast HTTP server
  • btsync see above
  • ddclient The Dynamic DNS client that allows me to have my DNS record updating when my ISP changes my IP address
  • Raspbian A Debian Linux variant for the Raspberry Pi

The hardware:

  • Raspberry Pi a circuit board sized computer
  • Indian takeaway container, no this isn’t the wacky name of some kickstarter project, I used one of the plastic tubs that takeaway curry comes in to provide a case for the Raspberry Pi, it keeps dust and/or water off it
  • Mini USB charger plug and cable from my Nexus 5 (I have so many of these and so I chose a smallish one)
  • Network cable from my man draw

The network

  • Existing Fibre connection and associated routers and access points
  • Netgear Gigabit Switch
Andy Hedges
[comment]

First Project Syndrome

I hold the opinion that Services (as in SOA) should, where at all possible, be delivered as separately budgeted and planned work from functional enhancement projects to avoid First Project Syndrome.

To understand what First Project Syndrome is let’s take a look at some graphs (bear with me…).

Capital Cost

Capital cost vs number of projects
Figure 1. Capital cost vs number of projects

As you can see from figure 1. the assertion is that the cost of delivering the first project with a service is higher than for the first project using a point solution (by point solution, I mean grabbing data from source data stores or replicating data into your database or any number of data sync techniques). The reason why this costs more for the first project is:

  • there are overheads with creating a service such as following their prescribed best practise
  • creation of infrastructure to run the service
  • some expert knowledge in SOA practices

However from the first project onward the savings are realised. The reasons why point solutions cost more from the second project onward are because:

  • a point solution often starts from scratch each time, the work has to be redone and redone slightly differently for each specific scenario
  • point solutions layer complexity upon complexity (e.g. tables accessed by many unknown systems, various extract files created for many systems, data shunted to and from multiple undocumented systems)
  • it’s very hard not to make mistakes when syncing data
  • syncing data causes all sorts of edge cases when trying to modify it

Operational Cost

Operational cost vs number of projects
Figure 2. Operational cost vs number of projects

Point solutions are almost always more expensive to manage, with a service built correctly and to specification the operational costs are lower: day one. They are easier to manage, monitor, failover and so on. The significant reason why they have these qualities is because they are conforming to set of good practises that specifically give these qualities. They leverage the wealth of investment made on previous service developments within the organisation. Each service then stands as a container for future enhancement in its particular business domain allowing functionality to be added and still providing the operational characteristics demanded.

More often than not the first project does not consider, in detail, the operational cost of managing the solution on-going or may not have the budget/resource/time to seriously take this in to account. Typically the projects have the more immediate concern of getting the solution shipped to the business.

Other Common Pit Falls

There are a number of other pitfalls of delivering services as part of the first project. In no particular order:

  • the first project’s scope determines the scope of the service making it less suitable for other consumers
  • freezing of the project causes the service to also be frozen where that service would have had wider benefits to the business. The case for the project didn’t stack up and so the assumption would be that the service’s case didn’t either.
  • compromises in the design of the project solution force compromises in the service design

To all these points I would posit that the first business project that requires a service should not be the project that delivers that service. This does, of course, mean that the business case should stack up against more than one business facing project (and if it doesn’t then it probably isn’t worth building).

You wouldn’t try to create a power station as part of the build of a house but every house needs one to function.

Andy Hedges
[comment]