woensdag 24 mei 2006

Domain Driven Design: A Quickstart (Part 1)


Some time ago, I bought the book Domain Driven Design, tackling complexity in the heart of software.
Since reading it, I became very interested in the Domain Driven Design paradigm. For enterprise applications, it would be ideal if you could express the core of the application (the domain layer; the part of the program that contains the business logic) in a good model.
The Object Oriented Programming paradigm provides a good way to express the model in a computer program.

So, although the behaviour can be expressed in an OO fashion, the data needs to be persisted as well. In most cases, a relational database is used to persist the data. Combining OO and RDBMS'es gives us the problem of the Object / Relational mismatch. You can offcourse solve this object-relational impedance mismatch yourself by writing a DAL that nicely maps the classes of your domain model to the tables of your relational database. In most cases, this means that you'll have to write a lot of code. Instead of implementing this functionality yourself, you could also opt for using one of the many existing O/R mapping tools, like NHibernate or LLBLGen.
As Frans Bouma once explained in one of his blogposts, there are different types of O/R mappers. NHibernate fits in another category then LLBLGen; In Frans' categorization, NHibernate fits in the 'Domain Approach', while LLBLGen fits in the Entity approach.
Since I'm interested in the Domain Driven approach, I've taken a look at NHibernate, and, while it's not 100% perfect, it still has a lot of advantages. It releases you from some boring tasks (like mapping - hey, that's why it's called an O/R mapper), and takes care of some more complex tasks (caching, state-tracking, ...).

The idea of this blogpost is to provide a little quickstart in Domain Driven Design and NHibernate, by creating a piece of software for a particular use case.

The Case

My idea was to create a simple application for a shop/manufacturer. A customer can order multiple goods at a time, and, when a customer has ordered for over 2500euro in the past 3 months, this customer is a gold customer.
When the order is shipped, an invoice has to be created for that Order. Gold Customers receive a discount of 5% on their invoice. On the other hand, customers that are known as 'bad payers', cannot place orders that have an order total that exceeds 250 euro. A customer is tagged as a 'bad paying customer', when 1/3rd of his invoices have been overdue.
Let’s say that a customer can make an order by phone, and via the website of the shop.
Pretty simple, no ? :) This is off-course not a real-world example, but it should be sufficient for the purpose of this article.

Modelling the domain

Following the Domain Driven Design principle, a model consists of entities, value objects and services. We can already extract some entities out of the given text:

  • Customer

  • Article

  • Order

  • Invoice

Another entity that is not so obvious, is the OrderLine entity. This one is needed because a Customer can order more then one article at a time, so we need to know which Articles have been ordered, and how many of them are ordered.
For the Invoice entity, it's the same story: there must be an InvoiceLine entity that represents each 'line' on the invoice.
This means that, at this time, our model consists of 6 entities. There are no Value objects and Services defined yet.

The entities that we've defined can be drawn in a first schema:

As you can see, a customer can have 0, 1 or more Orders, an Order contains one or more OrderLines, and every OrderLine must contain exactly one Article.
For each Order, there can be one Invoice.
If this were a database schema, this would be perfect. However, this is an (concise) UML diagram, and the classes in this diagram should not describe how our data must be persisted, but how our application should behave.

Now, there are some things in this ‘design’ that can be improved. If you look at the Customer and Order classes in the schema, you see that a Customer has a collection of Orders. This is in fact correct, but, I wonder if this is necessary to express in our domain-model.
In this case, we’re more interested in knowing to which Customer a specific Order belongs, rather then knowing or getting all the Orders of a specific Customer. To get a list of all the Orders of a specific Customer, we can always add a method in a Repository that gives us the list of Orders for a Customer, instead of giving the Customer class a collection of Orders. (I will come back on the Repository part later). This will simplify things a bit. This also means that, if we have customers that have made a lot of Orders, the Customer Object for that Customer doesn’t have to hold a large collection of Order objects.
For the relationship between the Order and OrderLine class, things are a bit different. I do not think we can give a direction to this relationship, since, we do want to know the OrderLines of an Order, since they are coupled to each other: an Order exists only because of its OrderLines. And for each OrderLine, we do want to know to which Order it belongs. So, this association has to be kept bidirectional.
Then again, the relationship between Order and Invoice, doesn't have to be bidirectional. I do not even know if we should have a 'coded' relationship between these 2 entities, because I don't think that it will often occur that we need to see the invoice that is linked to an order, or, the related order of an invoice. If we do need that, we can always get them by calling a method on the repository. However, I will keep the link between Order and Invoice on the schema, since, they're in a way linked to each other.

This gives us the following schema:

In this schema, you can see the directions of the associations.

The next step, is to define the aggregates in the model. An aggregate ‘clusters’ the entities and value objects that belong together.
In this case, we can define 4 aggregates: Customer, Order, Invoice and Product.
The Customer and Product aggregate only contain 1 entity, while the Order aggregate and the Invoice aggregate contains 2 entities; the Order and the OrderLine entity make up the Order aggregate, and the Order entity is the ‘aggregate root’. The aggregate root is the only object in the aggregate, where other objects that are outside of that aggregate, may have references to.
The Invoice aggregate is very similar: it's made up by the Invoice and the InvoiceLine entity, and the Invoice entity is the aggregate root.

Once we know the aggregates, we can define the repositories for our domain model. A repository is an abstraction which gives us references to our aggregates, and allows us to persist those aggregates. The underlying infrastructure can be a relational database, a file, … but our model doesn’t need to know that. We just have to be able to get aggregates, and save them back, so the repository provides us this abstraction.
We should not create a repository for every class in our model, we should create a repository per aggregate. In our example, it makes no sense to be able to retrieve OrderLine objects, without retrieving the corresponding Order object.
Knowing all this, we can extend our schema:

Here, you can see the 4 repositories (I've added some example operations to it), and the 4 aggregates. I've also drawn the aggregate boundaries of the Order and the Invoice aggregate. Since the other 2 aggregates (Customer and Product) only consist out of 1 entity, it is not necessary to draw their boundaries as well.

There is one thing that we'll need to keep in the back of our mind: we have to be able to create Invoices for Orders that are shipped and that have no Invoice yet. It would be a good idea to create a batch-process that runs every night, and that creates Invoices for Orders that are shippend and have no invoice yet. In other words: this would be ideally implemented as a service.

Now that we have identified the entities, aggregates and repositories that make up our domain model, we could start to put the model into code,
but, I'll keep that for another post that I hope to finish soon. :)

maandag 1 mei 2006

Is Software Development too hard, or too easy ?

I've come across some rather interesting blog-articles, like this one from Scott Bellware and this one from Jeffrey Palermo.

Both articles are a reaction to this article from Rockford Lhotka. Rockford Lhotka says in his article that software development is too hard, that we -developers- have to spend too much time doing 'plumbing work', instead of concentrating on delivering business value.
In a way, he has a point: the main point in writing a business application, is to create an application that solves the business problem, and, since time is money and business is constantly changing, it should be done as quick as possible. Isn't this what we're all striving for ?

However, this should not be done at all costs. I mean, there are RAD tools available that will allow you to 'develop' an application quickly, but, if those tools are used in an inappropriate way, you'll end up with an 'application' that is a hell to maintain and to extend.
I think we've all seen those kind of applications: in a first phase, those app's do what they have to do, but, as requirements change, the code gets messier and gets hard to understand and it gets even harder to implement new functionality or change existing functionality.

This is where the opinion of Jeffrey Palermo comes in: the RAD tools allow you to build an application without requiring to know what's going on under the hood, and they make it possible that somebody who is not trained in software engineering can create an application. However, the quality of that application will most likely be poor.
To put it in his words:

It’s too easy for an unskilled person to throw a screen together and deploy it. It’s too easy for Joe blow to create a database application that pulls over entire tables to the client for modifying one record (but it works – initially). It’s too easy for a newbie to get excited about a new technology and completely screw up an application with web service calls to itself and overdo sending XML to Sql Server 2000. It’s too easy to a database guy to throw tons of business logic in stored procedures, call them from ASP and call it an application (until a skilled programmer looks at it later and has a heart attack).

The problem with RAD tools, is that everybody can now create an application that does what it should do. It allows people that are not trained in software engineering to create an application that does what it has to do, and it's possible that the user of the application doesn't notice that the application is actually a piece of crap. And I believe this happens all too often.

My opinion is that RAD tools can reduce the workload, but they should be used with care.

When a RAD tool is used inappropriatly to create a business application, the chances are big that the developer uses the 'Smart UI' antipattern. This means that all the business logic of the application is put directly into the user interface. This is off course problematic when the requirements change. Since the business functionality is scattered throughout the user interface, the programmer who has to maintain this application will have to delve into the UI to find all the code and the related code that has to be changed. When the application is large, it's easy to forget or overlook something that has to be changed, and it will result in a buggy application.
Or, imagine that you've build a Windows application, and once it's delivered, your customer or boss wants to have a web interface for this application as well. When all business logic is implemented in the user interface, this means that you will not be able to just reuse that code in the web application. The result will most likely be duplicated business logic.

That's why I believe that a RAD tool should be used with care. In my opinion, you should use the RAD features of your development tool to build the User Interface, and that's about it.
Developing software is more then just dragging some components on a form, glueing them together by setting some properties and using wizards to get the data from the database and bind it to some kind of control.
Scott Bellware is right when he says that the RAD functionality of Microsoft's development tools encourages one to create badly designed software. Microsoft shows in demo's how one could create an application very fast, with a very small number of lines of code with their RAD tools. The sad thing is that there are developers attending these demo's who think afterwards that this must be the way to develop applications. And this is not only true for developers attending these sessions. Managers seeing these demo's can think that software development isn't that hard at all, and they also do not understand how it comes that it takes so much time developing an application.
This is not a good way of building software. Those RAD tools and wizards are very good for giving demo-sessions (and selling the tools), but they're defenitly not showing a correct way on how to build software.
What about the maintainability and flexibility of software created in this way ? What about the ability to create unit-tests to test the functionality of the implemented business rules ? It is all impossible with applications that are developed in this way.

For the core of the application -the business functionality- the development team should create a domain model that expresses the business problem that the application must tackle. This means off course
that the initial development cost of the application will be higher, but, this development cost should be seen as an investment. The model will be easier to maintain, extend and to reuse, and, by using Agile development
, the customer can be involved in the development process. By using small development iterations and having customer input after each iteration, the customer knows that the development of the application is going forward, and he can ring the alarm when he sees that the functionality of the application or the business logic is wrong.

To conclude: RAD tools provide a way to make software development easier, and because of that, one could be tempted to create an application in a quick and dirty way. However, building high-quality software still requires skilled and educated/trained developers. They're not only required to be able to create a good domain model. They also have to be skilled in a way that they know for what they should and shouldn't use RAD tools.