Monday, 24 March 2008

FileStream (Sql Server 2008)

The Filestream attribute is new to Sql Server 2008 (Katmai) and allows large binary objects (blobs) to be stored on the file system but to be managed by Sql Server. Since I first started looking into Katmai this has been the feature that I was most looking forward to because it solves a number of design challenges that I have faced in the past.

Before the FileStream attribute was available if you wanted to store large binary objects you had two choices:

  • store in Sql Server using the varbinary(max) type;
  • store in the file system.

Most sources I have come across (such as this) have generally advocated using the file system because of its performance benefits. But what if you needed to also:

  • persist metadata about the file;
  • apply a common security model;
  • backup files with their associated data;
  • replicate across multiple servers;
  • manage CRUD operations within transactions;
  • just be confident that the files stored in the file system hadn't been changed outside of intended code.

Whilst none of these issues are insurmountable, they all present challenges that mean a lot of effort needs to be put into infrastructure code.

If the decision is made to use the database then not only is it necessary to deal with the performance issues, but then there is also the 2GB limit that comes with the varbinary(max) type (and if using the Express editions then the database size limit can also be quickly reached).

With the FileStream attribute a lot of these issues get solved in Sql Server 2008 because if this attribute is applied to a varbinary(max) column when the table is created then the data stored in this column will get stored in the file system and can be accessed (inserted, updated, retrieved) directly from the file system using Win32 API calls, but always within the context of a SqlServer transaction. Not only this but there is no limit to the size of the file that is to be stored (excepting disk space) and the Express editions, as I understand it, will not include these columns in their database size restrictions.

So of the issues which are solved? Well:

  • persisting metadata is now simple as it can go in the same table as the file, or in related tables - just treat the column like any other.
  • because access is made through Sql Server, the usual sql security model can be applied. Just be aware that if a windows user has access to the folder where the FileStream filegroup is then they can open, delete, etc... the files!
  • FileStream data gets backed up as normal - if you need to exclude it then just configure a partial backup accordingly
  • replication is not a problem (see this page for more details) but mirroring is not an option.
  • working using the Win32 API calls (which is the performant option for inserting, retrieving, and updating) you don't even get an option - all operations take place within a sql transaction.
  • like I said above - be aware that if a user has rights to the part of the file system where the FileStream filegroup is being stored then they can do to the files stored whatever their permissions allow them to.

I'll probably blog some more on this later, and post some code, but in the meantime I'd definitely recommend anyone interested in learning more looking at any or all of the following:

1 comment:

Kapil said...

Hi,
I am working on filestream DB and has file groups for that DB. Facing issue an issue while taking the back up of DB since DB size is large.
To counter this issue I am looking for taking partial back up of the group files.
Please could you suggest any solution for this.
Thanks in advance.