woensdag 24 mei 2006

Domain Driven Design: A Quickstart (Part 1)


Some time ago, I bought the book Domain Driven Design, tackling complexity in the heart of software.
Since reading it, I became very interested in the Domain Driven Design paradigm. For enterprise applications, it would be ideal if you could express the core of the application (the domain layer; the part of the program that contains the business logic) in a good model.
The Object Oriented Programming paradigm provides a good way to express the model in a computer program.

So, although the behaviour can be expressed in an OO fashion, the data needs to be persisted as well. In most cases, a relational database is used to persist the data. Combining OO and RDBMS'es gives us the problem of the Object / Relational mismatch. You can offcourse solve this object-relational impedance mismatch yourself by writing a DAL that nicely maps the classes of your domain model to the tables of your relational database. In most cases, this means that you'll have to write a lot of code. Instead of implementing this functionality yourself, you could also opt for using one of the many existing O/R mapping tools, like NHibernate or LLBLGen.
As Frans Bouma once explained in one of his blogposts, there are different types of O/R mappers. NHibernate fits in another category then LLBLGen; In Frans' categorization, NHibernate fits in the 'Domain Approach', while LLBLGen fits in the Entity approach.
Since I'm interested in the Domain Driven approach, I've taken a look at NHibernate, and, while it's not 100% perfect, it still has a lot of advantages. It releases you from some boring tasks (like mapping - hey, that's why it's called an O/R mapper), and takes care of some more complex tasks (caching, state-tracking, ...).

The idea of this blogpost is to provide a little quickstart in Domain Driven Design and NHibernate, by creating a piece of software for a particular use case.

The Case

My idea was to create a simple application for a shop/manufacturer. A customer can order multiple goods at a time, and, when a customer has ordered for over 2500euro in the past 3 months, this customer is a gold customer.
When the order is shipped, an invoice has to be created for that Order. Gold Customers receive a discount of 5% on their invoice. On the other hand, customers that are known as 'bad payers', cannot place orders that have an order total that exceeds 250 euro. A customer is tagged as a 'bad paying customer', when 1/3rd of his invoices have been overdue.
Let’s say that a customer can make an order by phone, and via the website of the shop.
Pretty simple, no ? :) This is off-course not a real-world example, but it should be sufficient for the purpose of this article.

Modelling the domain

Following the Domain Driven Design principle, a model consists of entities, value objects and services. We can already extract some entities out of the given text:

  • Customer

  • Article

  • Order

  • Invoice

Another entity that is not so obvious, is the OrderLine entity. This one is needed because a Customer can order more then one article at a time, so we need to know which Articles have been ordered, and how many of them are ordered.
For the Invoice entity, it's the same story: there must be an InvoiceLine entity that represents each 'line' on the invoice.
This means that, at this time, our model consists of 6 entities. There are no Value objects and Services defined yet.

The entities that we've defined can be drawn in a first schema:

As you can see, a customer can have 0, 1 or more Orders, an Order contains one or more OrderLines, and every OrderLine must contain exactly one Article.
For each Order, there can be one Invoice.
If this were a database schema, this would be perfect. However, this is an (concise) UML diagram, and the classes in this diagram should not describe how our data must be persisted, but how our application should behave.

Now, there are some things in this ‘design’ that can be improved. If you look at the Customer and Order classes in the schema, you see that a Customer has a collection of Orders. This is in fact correct, but, I wonder if this is necessary to express in our domain-model.
In this case, we’re more interested in knowing to which Customer a specific Order belongs, rather then knowing or getting all the Orders of a specific Customer. To get a list of all the Orders of a specific Customer, we can always add a method in a Repository that gives us the list of Orders for a Customer, instead of giving the Customer class a collection of Orders. (I will come back on the Repository part later). This will simplify things a bit. This also means that, if we have customers that have made a lot of Orders, the Customer Object for that Customer doesn’t have to hold a large collection of Order objects.
For the relationship between the Order and OrderLine class, things are a bit different. I do not think we can give a direction to this relationship, since, we do want to know the OrderLines of an Order, since they are coupled to each other: an Order exists only because of its OrderLines. And for each OrderLine, we do want to know to which Order it belongs. So, this association has to be kept bidirectional.
Then again, the relationship between Order and Invoice, doesn't have to be bidirectional. I do not even know if we should have a 'coded' relationship between these 2 entities, because I don't think that it will often occur that we need to see the invoice that is linked to an order, or, the related order of an invoice. If we do need that, we can always get them by calling a method on the repository. However, I will keep the link between Order and Invoice on the schema, since, they're in a way linked to each other.

This gives us the following schema:

In this schema, you can see the directions of the associations.

The next step, is to define the aggregates in the model. An aggregate ‘clusters’ the entities and value objects that belong together.
In this case, we can define 4 aggregates: Customer, Order, Invoice and Product.
The Customer and Product aggregate only contain 1 entity, while the Order aggregate and the Invoice aggregate contains 2 entities; the Order and the OrderLine entity make up the Order aggregate, and the Order entity is the ‘aggregate root’. The aggregate root is the only object in the aggregate, where other objects that are outside of that aggregate, may have references to.
The Invoice aggregate is very similar: it's made up by the Invoice and the InvoiceLine entity, and the Invoice entity is the aggregate root.

Once we know the aggregates, we can define the repositories for our domain model. A repository is an abstraction which gives us references to our aggregates, and allows us to persist those aggregates. The underlying infrastructure can be a relational database, a file, … but our model doesn’t need to know that. We just have to be able to get aggregates, and save them back, so the repository provides us this abstraction.
We should not create a repository for every class in our model, we should create a repository per aggregate. In our example, it makes no sense to be able to retrieve OrderLine objects, without retrieving the corresponding Order object.
Knowing all this, we can extend our schema:

Here, you can see the 4 repositories (I've added some example operations to it), and the 4 aggregates. I've also drawn the aggregate boundaries of the Order and the Invoice aggregate. Since the other 2 aggregates (Customer and Product) only consist out of 1 entity, it is not necessary to draw their boundaries as well.

There is one thing that we'll need to keep in the back of our mind: we have to be able to create Invoices for Orders that are shipped and that have no Invoice yet. It would be a good idea to create a batch-process that runs every night, and that creates Invoices for Orders that are shippend and have no invoice yet. In other words: this would be ideally implemented as a service.

Now that we have identified the entities, aggregates and repositories that make up our domain model, we could start to put the model into code,
but, I'll keep that for another post that I hope to finish soon. :)

8 opmerkingen:

Anoniem zei

This is an interesting (long) post. I did not have a notion about Domain Driven Design until I have read your post.

It seems that DDD can provide a very clean solution to some complex domain because you do not have to take care of storing/synchronizing/... the collections in your domain model. I think this is something the repositories have to take care of.

P.J. van de Sande zei

Because you have a great missmatch with the Domain Objects and your RDBMS, why don't you use or suggest ODBMS?

Normally you loose a lot of time in and with developing the OR and it is allways the same story, create your UML of you Domain Model, create a relational Database Model and then create a mapping for your RDBMS and Domain Model. This mapping is allways a lot of code, costs a lot of time and gives a lot of problems and questions.

I know there are some good reasons why an ODBMS would not fit in a project, but i wondered why you talked a lot in this post about OR but only uses the most missmatching type, an RDBMS?

Frederik Gheysels zei

Why I do not use an ODBMS in this example:
- ODBMS'es are -imho- not widespread. They're not used a lot, and personally, I do not know any project myself that uses an ODBMS.
An RDBMS is a proven technology, and imho still the best way to store critical data.
- Relational Databases are a great way to store data efficiently, and storing data in a relational way, is also great if you want to make reports on the data. In a Business System, Reporting is a very important issue. Creating reports from an ODBMS is -imho- not as easy.
- When you use an ODBMS, this means that you'll have to put your domain model in your DB as well. Then, what if another application needs to work on the DB as well, but, has another perspective on the data ?
- What if you have to build a new application on an existing (Relational) Database ?

You're right that developing the OR is very expensive; it takes a lot of development if you do it yourself.
However, if you use an existing O/R tool like NHibernate, WilsonOR, ... it will greatly reduce your development time.
Using an ODBMS in this example would surely be easier for me. However, since I think most people still use RDBMS'es, I think it is better to discuss DDD with the combination of an RDBMS in this article.
Another thing to take into consideration: ODBMS'ses have been around for quite some time now, but, they're still not widely used. I wonder if ODBMS'es will eventually replace RDBMS's, but at this time, I'm not convinced that this will happen.
In other words: I do not think that ODBMS'es will replace RDBMS'es over time.

PJ. van de Sande zei

Indeed Frederik there are some very very good reasons a ODBMS not fits in a project! I can't deny that.
But there are lot of projects where it could fit imho. Because OR is allways such a great bottleneck and overhead in DDD, it staggered me everytime i read an article. It is allways about the problems with OR and there are only a hand full of projects that really chouse for an ODBMS.
If you only need one datasource and you are not gonne share this with third party tools or software that has to use the datasource directly, then there is no excuse imho.

Give it a try in a free weekend ;)

Frederik Gheysels zei

The problem is that it is not always the developer (or dev-team) that can choose which tools to use (RDBMS / ODBMS).
In a lot of cases, you'll have to write an application that has to work with data that is already available in an RDBMS. In those cases, it is not a good idea to migrate the RDBMS to an ODBMS.
Next to that, I haven't met a lot of people who have experience with ODBMS'es, so, finding people that are proficient with ODBMS'es is not an easy task.

Also, I do not think that you should sacrifice the advantages of an RDBMS just because it is easier to use an ODBMS when you're developping an OO application. A good O/R mapper will bridge the gap between the OO Model and the RDBMS, this means that the cost of bridging the O/R mismatch can be highly reduced.

And, in DDD, the way of persisting the data is actually an implementation detail. The domain should not be aware of how the objects are persisted, and where they come from.
The Repositories should make abstraction of that. So, in DDD, it really doesn't matter on how the data is persisted.

P.J. van de Sande zei

I can't agree more with you last argument.

But i still think there are to much developers that stick to the RDBMS and don't try alternatives, except that short Xml-hype ;)

Anoniem zei

Waar blijft de follow-up? ;)

Frederik Gheysels zei

Ik ben er mee bezig... Echter, door de hitte gaat het niet zo snel vooruit. :)
Een eerste draft is er al, en ik hoop van tegen het einde van deze week, of anders in de eerste week van augustus het definitieve artikel te kunnen posten.