09 August 2013

Using Isilon as a Backup Target for TSM


# update 01.04.2013: based on even more existing installations we typically see more than 400 MB/s throughput with TSM on actual hardware (NL 400 nodes). That's in line with our throughput sizing for other workloads. /update

These days many customers look for efficient ways to leverage relatively new features of their backup software like de-duplication, node replication and backup of virtual environments. Several of these features cannot be efficiently used when backing up and archiving data to tape. Also we see more and more use cases that require regular and more frequent access to backup- or archived data which as well is not well suited for data that sits on tape. I am currently working with some clever guys (see section contributions) who implemented Isilon at several customers to overcome typical limits of traditional backup to tape or disk solutions and I found these ideas are worth sharing here. Although this article discusses the challenges and solutions along Tivoli Storage Manager TSM, similar principles are valid for other backup solutions of other vendors like Commvault, Symantec, EMC, CA and others. Over time, disk solutions became more and
more cost effective and with Isilon we have an easy to manage and very cost effective solution that scales up to approximately 15 PB uncompressed and not de-duped capacity.


Why backup to disk

As stated above, several use cases won’t allow using tape as a backup media. Several advantages come along with backup to disk strategies:

  • Faster backup, especially for unstructured data
  • Much faster access to and restore of data
  • Less secondary workload (tape migrations)
  • Improved SLAs (have you ever really estimated how long it would take to restore the majority or all of your data from tape?)
  • Recovery times not dependent on the number of available tape drives
  • The SAN infrastructure is obsolete (with Isilon)
  • Lower TCO in many cases. This depends of course on several factors like capacity, frequency of access, de-duplication ratio and others.


Issues with traditional disk arrays

Backup to disk is not a new thing but the management overhead of traditional disk arrays are well known to TSM administrators:

  • Traditional filesystems cannot easily be shared among TSM servers
  • Many filesystems have a limited size (i.e. 32TB) which is way too small for a backup environment
  • Filesystems cannot grow to accommodate any size without manually reshuffling and re-balancing tons of data through TSM.  
  • The management can be quite complex. Some examples are:
       - Storage array management
       - Dedicated SAN adapters
       - Device Drivers
       - Array definitions
       - LUN definitions for each array
       - LUN masking
       - SAN zoning
       - Volume Groups
       - Logical Volumes  
       - Filesystems
       - Device class definitions
       - etc.
  • Typically a performance monitoring and management is required.

All these issues are avoided when using Isilon as a backup target. All your TSM servers can mount a single scalable filesystem through NFS. The simplified infrastructure is shown in the following picture.


Figure1: Isilon and TSM Servers/Clients using 10 Gigabit Ethernet


But hasn’t NFS been known to be a slow solution of TSM?

Well, that has been true for traditional NAS arrays but not so with Isilon. As you may know, Isilon’s development started over a decade ago for multimedia streaming. And using Isilon as a sequential file pool for TSM is a well suited workload. Test results provided by Concat and General Storage have shown a throughput of approximately 1,200 MB/s using three TSM instances on a single Linux Server and 1,000 local clients that backup data to an Isilon cluster with just four NL400 nodes. The setup has not been tuned and the throughput results are shown in the following figure.


Figure 2: TSM Backup throughput using local clients


Standard TSM Setup

The setup of the test-environment has been quite straight forward with no specific tuning:
Here are some of the main steps used to configure the TSM setup:

...
mkdir /tsmisilona/tsm1
mkdir /tsmisilona/tsm1/instance
mkdir /tsmisilona/tsm1/storage
...
dsmicfgx ... instance=/tsmisilon/tsm1/instance ...
...
(TSM) def devcl file devt=file maxcap=10000m dir=/tsmisilon/tsm1/storage
(TSM) def stg backuppool file maxscr=1000000
...
...
(TSM) upd devcl file dir=/tsmisilona/tsm1/storage,/tsmisilonb/tsm1/storage,/tsmisilonc/tsm1/storage
Listing 1: TSM Instance setup steps


Generate massive data on ‘virtual’ TSM Clients and NFS Mount options

Backup tests at scale require typically heavy IO workload.  To avoid setting up a very large TSM client infrastructure to generate massive throughput, about 1,000 clients have been used on the server that got fed via the TSM client API with scripts that generate data with an average file size of 5 MB. This method eliminates the requirement of reading sufficient data from disk and prevents any bottleneck on the client side. The client processes wrote their data via the loopback interface to the local TSM server on the same system.

Using three different mount points (rather than one) have shown to be much more efficient. The following mount options have been used:

mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072 isilon01-fast.lab.local:/ifs /tsmisilona
mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072 isilon01-fast.lab.local:/ifs /tsmisilonb
mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072 isilon01-fast.lab.local:/ifs /tsmisilonc
Listing 2: NFS Mount options

As stated earlier, the setup that provided the results shown in figure 1 has not been tuned. Using multiple servers may even improve the results. However, the results already show how effective the solution is. You can expect that the throughput scales almost linearly with the number of nodes that you add to the cluster while the management of the storage and infrastructure does not increase.

One thing I need to mention is that you should not consider putting the TSM database on Isilon. Highly random access patterns that we typically see on the TSM database are something that doesn’t suite well on Isilon today. That might change in the future but today you would ideally use some fast internal disk or a SSD based array. In the test setup discussed here two 1TB SATA disks have been used and the TSM database has been served from the server’s cache.

Why Isilon provides a more efficient solution

If you are a regular reader of this blog you already know why Isilon helps to address all the issues mentioned above. Isilon comes along with just one filesystem which does not require managing RAID arrays, aggregates, logical or physical volumes, SAN adapters, drivers and the like. The filesystem of Isilon (OneFS) stripes data across all nodes (of a disk pool – see this post for more details), it auto balances data blocks to avoid unbalanced utilization of resources and it helps to avoid future data migrations in case of technology refreshes. The expansion of the cluster just takes a few seconds of management actions (see this video) and once a node has been added, the space for TSM is available immediately. Here is an example that shows how easy and fast new capacity (nodes) can be added to the cluster and the space being available to TSM instantaneously:

# Just show the current time:
tsm: GSWARM01>sh time

Current Date and Time on the Server    
----------------------------------------
04/28/2013 14:48:39                    
UTC (GMT) Date/Time is: 04/28/2013 12:48:39 PM
Daylight Savings Time is in effect: YES

# Now let’s look at the available disk space
tsm: GSWARM01>q dirspace

Device Class   Directory                                Estimated        Estimated
Name                                                     Capacity        Available
------------   ---------------------------------   --------------   --------------
ISIDCNORD      /tsmd1isi/gswarm01/data              427,671,368 M    118,849,927 M

# Now we add another node like shown in the video
# Then display again the capacity and available space for TSM

tsm: GSWARM01>q dirspace

Device Class   Directory                                Estimated        Estimated
Name                                                     Capacity        Available
------------   ---------------------------------   --------------   --------------
ISIDCNORD      /tsmd1isi/gswarm01/data              534,589,210 M    225,767,761 M

# As you can see this took not even two minutes

tsm: GSWARM01>sh time

Current Date and Time on the Server    
04/28/2013 14:50:08                    
UTC (GMT) Date/Time is: 04/28/2013 12:50:08 PM
Daylight Savings Time is in effect: YES
Listing 3: Expansion of an Isilon cluster adds capacity for TSM within two minutes

As you can see, the expansion of an Isilon cluster is very easy and the capacity is available to TSM immediately. The data redistribution is performed by Isilon in the background and it should not affect the production workload. This is just one example of Isilon’s ease of use and the reduction in complexity to the application layer. Other features like remote replication, snapshots, flexible data protection etc. help to protect the data with Meantime to Data Loss (MTDL) values that reach billions of years (for example a N+3 data protection setting yields to a calculated MTDL of about 3 billion years while the protection overhead on a 10 node cluster for that level of protection is only 30%).

Summary

Isilon provides a very efficient infrastructure that allows effective deployments of backup to disk scenarios with high performance and a very high level of data protection. Software features of de-duplication, compression and node replication can be used while SLAs can be improved dramatically, especially for data that needs to be accessed regularly. Even scenarios where HSM is used on a smaller high performance filesystem can be deployed with Isilon as an effective external archive tier (well you may ask why not using Isilon with its tiering function (Smartpools) without HSM and that’s a valid question. However, the world is complex and if someone has a HSM infrastructure in place already it can be a good solution with all the advantages of a disk based tier over (or in addition to)  a tape solution, especially if the data is accessed frequently).
We can summarize the advantages of the Isilon as a backup target solution as follows:
  • Reduced complexity for the TSM deployment
  • No more SAN components required as well as all SAN management
  • Almost zero management when adding capacity
  • Well suited workload for Isilon with measured throughput of approximately 300MB/s/node on NL400 nodes without any optimization (just one TSM server)
  • Read performance is typically even much better
  • With a shared filesystem and TSM node replication it’s only one step from the monolithic TSM architecture towards an infrastructure that looks like backup as a service approach.

Contributions

Thanks to Lars Henningsen from General Storage and Stéphane Criachi from Concat for providing input and their test results for this article. These guys have expert knowledge in backup solutions and Isilon and I would advise you to get in contact with them if you consider implementing solutions that I outlined in this article. Also my colleagues Andrej Kienkov and Frank Krämer from IBM provided some useful comments.

Further Reading:



7 comments:

  1. Great article on how a TSM environment can be optimized with an scalable Isilon backup to disk solution. Not only Isilon is a great backup target for TMS also EMC Data Domain is. With Data Domain you get all the B2D advantages mentioned above AND optimized high speed granular de-duplication. A Data Domain can be connected over FC and/or NFS/CIFS to multiple TSM servers at the same time. TSM pools on a Data Domain can also be replicated across existing IP networks eliminating the need for TSM to run costly migration and copy storage pool operations.

    Check it out !

    ReplyDelete
  2. This is my very first time that I am visiting here and I’m truly pleasurable to see everything at one place.info from onlinebackupguide

    ReplyDelete
  3. Hi, having read this awesome written piece I’m also pleased to fairly share my familiarity here with colleagues.best graphics card 2014

    ReplyDelete
  4. As an Isilon/TSM administrator I must say that isilon is the finest TSM Backup Target (fom TSM) available from EMC. It is so simple, scalable, powerful, no fuzz and Isilon has a pefect fit with TSM deduplication, Node Replication, TSM TB licensing.

    ReplyDelete
  5. Very nice posting. Your article us quite informative. Thanks for the same. Our service also helps you to market your products with various marketing strategies, right from emails to social media. Whether you seek to increase ROI or drive higher efficiencies at lower costs, Pegasi Media Group is your committed partner will provide b2bleads.Emc Software Products Users Email List

    ReplyDelete