woensdag 28 december 2005

Avoiding SQL injection and date problems with queries in .NET

I've already written a little piece about this subject a while ago on a dutch board.
Since I've seen that a lot of people are still writing embedded queries in their applications using string-concatenation, I've decided to write a little article about it on my blog.

First things first:
Writing your embedded queries using string-concatenation like this:
string sqlQuery = "SELECT column FROM table WHERE column = \'" + 
aVariable +"\'";
is bad practice.

It is not easy to read, it's hard to maintain, it does not offer optimal performance but, most of all, it creates a security-leak; your query is now vulnerable to SQL-injection.

The variable 'aVariable' can contain anything, and, if you do not check the contents of the variable, some unpleasant things can happen.
Suppose that aVariable contains this:
a';drop table tablename;--

Now, if you haven't checked the contents of the variable, and took some appropriate action, the query that will be executed will look like this:
SELECT column FROM table WHERE column = 'a';drop table tablename;--'

As you can see, 2 queries will be executed:
first, a select will take place, and then, a drop table statement is executed. As you can imagine, you -or your employer- will not be happy with this.

Another problem with string-concatenation comes to the surface when you want to filter on a date.
There are numerous ways of representing a date: dd/mm/yyyy, yyyy-mm-dd, mm/dd/yyyy, etc...
So, if you use string concatenation to create a query that filters on a date, you'll have to make sure that you use the same date-notation in your query as the date-notation that your DBMS uses.

So, to avoid these date and SQL injection problems, you can ofcourse write a bunch of code that tackles these issues but...
Why don't we let the database take care of these problems ?

This can easily be achieved by using parametrized queries instead of writing the query using string contatenation.

So, instead of writing your query like this:
string sqlQuery = "INSERT INTO customer " +
"( custName, city ) " +
"VALUES " +
"(\'" + name + "\', \'" + city + "\')";
you can write it like this in .NET (when you make use of the SqlClient)
string sqlQuery = "INSERT INTO customer " +
"( custName, birthdate, city ) " +
"VALUES " +
"(@p_name, @p_birthdate, @p_city")";
Or, if you use the OleDb namespace, you'll have to write it like this:
string sqlQuery = "INSERT INTO customer " +
"( custName, birthdate, city ) " +
"VALUES " +
"(?, ?, ?)";

As you can see, instead of 'inserting' the variables directly into the query, we use some kind of a 'placeholder' (a parameter).

Now, we still can't execute the query, because we need to give the DbCommand that we will use some additional information about the parameters. (We need to specify the values for instance).

This is the code that is needed to execute
string sqlQuery = "INSERT INTO customer " +
"( custName, birthdate, city ) " +
"VALUES " +
"(@p_name, @p_birthdate, @p_city")";

SqlCommand cmd = new SqlCommand(theConnectionObj);
cmd.Parameters.Add ("@p_name", SqlDbType.Varchar);
cmd.Parameters.Add ("@p_city", SqlDbType.Varchar);
cmd.Parameters.Add ("@p_birthdate, SqlDbType.DateTime);

cmd.Parameters["@p_name"].Value = custName;
cmd.Parameters["@p_birthdate"].Value = dateOfBirth;
cmd.Parameters["@p_city"].Value = cityName;

cmd.ExecuteNonQuery();

As you can see, the Parameters property of the DbCommand is used to give some information about the parameters that our query contains. For each parameter, we specify the datatype and we give it a value.
Now, we do not have to worry about escaping quotes that may appear in the customers' name, we do not have to worry about the date-notation of the date of birth, we do not have to worry about possible sql statements that are injected into the query, etc...

(Please, check the MSDN for more information about the Parameters collection, and the overloaded 'Add' methods. It is possible to specify output-parameters, etc..).

If you use the OleDb classes instead of the SqlClient classes, you might wonder how you will be able to make the distinction between the different parameters in one query, since you're not able to use named parameters. (You'll use a question mark as a place-holder).
Well, with the OleDb classes, you'll have to add the parameters in the correct order.
So, the first parameter in your query, must also be the first one that you'll add to your 'Parameter' collection.

Parametrized queries are not only available in .NET, they're available in any serious language. How to use them, depends from language to language, so, refer to the documentation / manual of the language you use to see how to use them.

1 opmerking:

PJ. van de Sande zei

When you use the OleDb namespace you can use @ParameterName, but the name doesn't do anything. It is the order of adding the parameters to your command that make a different.

So when you have this query:

SELECT * FROM Employees WHERE Age>@Age1 AND Age<@Age2

You must first provide the Age1 parameter value and then Age2.

OleDb doesn't look at the parameter names, but at the order of the paremeters.