Tuesday, February 7, 2012

SSIS: An Error Handling approach

Error Handling in SSIS

The most common methods to handle the errors in the SSIS package are

1.Redirecting the Error Row along with the Error Code and Error Column in the DFT
2.Make use of event handlers for logging the errors in the Control Flow

By making use of Event Handlers you can Log custom messages in to custom tables during OnWarning , OnError events on the control flow tasks.This could be done by inserting custom error messages into the table using execute sql task or a Script component.

However while ETL i.e in a DFT you could redirect just the error rows and log error message for them.By Redirecting the Error Row you get the data of all the fields in the Error Row along with couple of additional fields ErrorCode and ErrorColumn.The Error Row itself could be used while logging.Its a shame that SSIS does not provide you the information of the exact column where the Error Occurred.The Error Column should not be mistaken for the column name , if you noticed it before its data type is an integer and it provides the lineageid of the respective column where the Error Occurred.

LineageId is something which is generated by the SSIS itself to identify the fields individually.If you open SSIS package as an XML you could view these lineage Ids assigned for each of the fields .

Now I am looking at the other option of getting the Error Description from the Error Code and using that in the custom Log Table which I created.Again the Error Description can be derived from Error Code using the following C# Script in the Script Component

Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.ErrorCode);

The Error Description here again is not meaningful since it is a system generated error message and is of little use to trace down the Error Column.
To get the Error Column there are few Custom Components available online , but due to lack of supporting documents and security reasons I tried not to implement them.



Custom Error Description

In order to get the Error Column I decided to use the lineageId to trace the Field Name where the Error occured.
I created a Lineage Lookup Table with fields such as
PackageName , LineageId , ColumnName
Then I created a Package which would load the PackageName, LineageIds and the ColumnName from all the Packages available in a particular folder.
The Package would do a ForeachLoop on all the .dtsx files in the provided folder , read the SSIS Package as an xml and upload the LineageLookup table.





I used the following Code in C# Script in a Source component in DFT to insert rows to the destination lookuptable

/*
Add rows by calling the AddRow method on the member variable named "Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/

XDocument xdoc = XDocument.Load(this.Variables.PackageFolder + "\\" + this.Variables.PackageXMLName + ".dtsx" );

var LineageNodes = from Nodes in xdoc.Descendants()
where Nodes.Attribute("lineageId") != null &&
Nodes.Attribute("lineageId").Value != String.Empty &&
Nodes.Attribute("name") != null &&
Nodes.Attribute("name").Value != String.Empty
select new
{
LineageId = Convert.ToInt64(Nodes.Attribute("lineageId").Value),
ColumnName = Nodes.Attribute("name").Value
};

foreach (var Item in LineageNodes)
{
MyOutputBuffer.AddRow();
MyOutputBuffer.LineageId = Item.LineageId;
MyOutputBuffer.PackageName = this.Variables.PackageXMLName;
}



After Successfully populating the LineageLookup table , we could use this look up table in our package to look up for Field names against the Error Column(LineageId).
Remember while doing a lookup we need to lookup on the package name and then the LineageId to get the Field name.Since lineage id is not a unique number and will be repeated for the packages.

After picking up the FieldName we will use this for forming our custom error message in either a Derived column or script component.Say “Error Occurred in the Column:”+ ColumnName


Where to use Error Row Redirect?

We could pretty much redirect Error Rows in most of the source, transformation and destination components.We need to make sure we configure the value "Redirect Error Rows" in the Error Output.
When we redirect error rows in all the transformation then it is worth combining them using a Union All and then do an insert in to the error log table.Union All is a performance constraint so it comes with a cost.

While redirecting Error Rows in destination make sure you are not doing a Fast Load in to the table which would not let you get the error row .

7 comments:

  1. Good explanation kannan. According to my experience using script task may affect the performace because row by row processing.We can use Error output flow and execute SQL task.

    ReplyDelete
  2. I agree dhinesh. This is just a means to find the error column name , which i was able to get from the package xml.

    ReplyDelete
  3. hi Kannan, can you please share the sample package?
    Regards,
    Viswanathan M

    ReplyDelete
  4. very useful congrats...

    ReplyDelete
  5. thank u for sharing this great information it's useful to me.
    msbi ONLINE TRAINING

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete

Creating a Meta Data Driven SSIS Solution with Biml

Biml Biml ( Business Intelligence Markup Language ) is a markup language that enables you to quickly represent a variety of database ...