How to Query Multi Value fields in Active Directory using SSIS

Apparently what’s even more difficult than importing data from AD is figuring out how to import multi-value objects from Active Directory.  “Description” is an example of a standard AD multi-value field.  My client had many custom multi-value fields added to AD and needed to import the data from these fields into tables in a database.  You can accomplish this easily this by adding a bit of code to the C# code importing the single value attributes as outlined in my previous post Getting Around AD Paging on SSIS Import

This C# code is much simpler than trying to import each multi-value field using a Data Flow task.  Using Data Flow tasks can be done but it has some tricky problems like importing only those records with values in the multi-value field, working around paging, and how to deal with apparently empty objects that your query returns even though you specified that it only return those objects with values.  It’s also quite a bit slower as you need to populate variables and pass those variables to loops to iterate thru the multi-values for one account at a time.

Here is the code for importing one multi-value attribute into a table.  This code should be placed at an appropriate spot within the  “foreach (SearchResults” loop outlined in the Getting Around AD Paging on SSIS Import post.

 

string propertyName = “Description”; //or whichever multi-value field you are importing

ResultPropertyValueCollection valueCollection = results.Properties[propertyName];

//Iterate thru the collection for the user and insert each value from the multi-value field into a table

foreach (String sMultiValueField in valueCollection)

{

string sValue = sMultiValueField.Replace(“‘”, “””); //Replace any single quotes with double quotes

sqlConnection1.Open();

cmd.CommandText =

“INSERT INTO User_Descriptions (sAMAccountName,Description) VALUES (‘” + sAMAccountName + “‘,'” + sValue + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

}

The nice thing about this code is that you can iterate through any records, even if the multi-value field is empty.  It won’t fail, it just won’t return a record.  This means you can add this same chunk of code multiple times edited for several different multi-value fields within the same script task, and have all your tables updated using the same script.  The package is very easy to maintain, with no package variables, no complex package logic, just a simple script.  Very elegant!

Get around Active Directory Paging on SSIS import

I have a client who is importing certain users from Active Directory.  The paging on their AD is set to 20,000 records.  When trying to pull data using a SQL statement, the query fails because it hits the maximum number of records and is unable to return more.   You could work around a problem like this by editing your query filter to ensure that you always retrieve fewer than 20,000 records at a time, for example using the whenCreated field.  However, there is no guarantee that whatever filter you use will always limit your return value to a maximum of 20,000 records.  And you now need to build a loop construct to retrieve all the records since you want more than 20,000 records.

This is much easier to solve than you might think, judging from the number of forum questions out there on the subject (and how long it took me to piece it together).   Here are the steps.

Create an SSIS package.

Add a string variable, scoped to the package, called SqlConn.  Populate it with the connection string to the database you want to populate with the AD records.

Add a script task to your package.  Open the script task, making sure that the ScriptLanguage is C# and not VB.

image

Click on the Edit Script button.  On the right hand side you should see the Project Explorer window.  Right click on the name of the Project File at the top of the tree and select Add Reference.

SNAGHTML6315ef17

On the .NET tab scroll down and find System.DirectoryServices. Select it and click OK.

image

Make sure you see the reference appear in the References folder in the Project Explorer window.

image

Add these statements at the beginning of your script.

using System.DirectoryServices;

using System.Data.SqlClient;

Paste this script to replace the public void Main().  Edit the ds.Filter and Insert string values to meet your table requirements.  Be sure to only select single value attributes of the object.   If you try to use this method to import multi-value attributes such as “Description” from AD it won’t work.  I’ll be writing about that next.

public void Main()

{

//Set up the AD connection;

using (DirectorySearcher ds = new DirectorySearcher())

{

//Edit the filter for your purposes;

ds.Filter = “(&(objectClass=user)(|(sAMAccountName=A*)(sAMAccountName=D0*)))”;

ds.SearchScope = SearchScope.Subtree;

ds.PageSize = 1000;

//This will page through the records 1000 at a time;

//Set up SQL Connection

string sSqlConn = Dts.Variables[“SqlConn”].Value.ToString();

SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);

SqlCommand cmd = new SqlCommand();

SqlDataReader reader;

cmd.CommandType = CommandType.Text;

cmd.Connection = sqlConnection1;

//Read all records in AD that meet the search criteria into a Collection

using (SearchResultCollection src = ds.FindAll())

{

//For each record object in the Collection, insert a record into the SQL table

foreach (SearchResult results in src)

{

string sAMAccountName = results.Properties[“sAMAccountName”][0].ToString();

string objectCategory = results.Properties[“objectCategory”][0].ToString();

//Replace any single quotes in the string with two single quotes for sql INSERT statement

objectCategory = objectCategory.Replace(“‘”, “””);

sqlConnection1.Open();

cmd.CommandText = “INSERT INTO Users (sAMAccountName, objectCategory) VALUES (‘” + sAMAccountName + “‘,'” + objectCategory + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

} } } }

 

That’s it.  This will iterate through all of the objects in Active Directory, regardless of paging size set on Active Directory.

To learn how to import multi-value fields from AD, read this post:

How to Query Multi-Value Fields from Active Directory using SSIS

Log not truncating on Checkpoint

I have a client who was running out of disk space due to a database log which was growing exponentially.  The database was set to Simple recovery, and the log was set to Truncate on Checkpoint.  The log was growing hugely and had to be manually shrunk every few days.

To find out the reason for the log not truncating I ran this query.

SELECT name,log_reuse_wait , log_reuse_wait_desc FROM sys.databases

I found out that the database was not truncating due to an Active Transaction. This can be caused by a long running transaction or by a deferred transaction.  http://technet.microsoft.com/en-us/library/ms345414(v=sql.105)

In the meantime the database went into Recovery mode and a number of things happened of which I am not aware.  Somehow the active transaction got cleared and the client manually shrunk the log file. 

We changed a few settings to minimize the damage should this happen again. The Autogrowth was set to 20% with no maximum.  We change the Autogrow settings to something more reasonable, with the initial size of 5GB, and a maximum of 10GB to at least protect the other files from getting stalled due to no space on the drive.

Then I looked at capturing the Active Transaction information.  Unfortunately, unless the Active Transaction was running, I was unable to find out what the transaction was.  I built a very simple SSIS package which I ran every 5 minutes to discover if an Active Transaction was holding up the log file, and to capture some information about what transaction was running and who was running it.

image

This package populated a new table in the data warehouse with the attributes of a transaction.  To avoid filling the table with information I didn’t need, I wrote the query so it would only populate the table if there were any transactions holding the log open.  Here is the source query for the Data Flow.  Should the IF EXISTS be negative, the Data Flow task would fail causing the Send Mail Task not to be triggered.

IF EXISTS

–IF log is held up due to Active Transaction
(SELECT  name, database_ID, log_reuse_wait , log_reuse_wait_desc
FROM sys.databases 
WHERE log_reuse_wait_desc =  ‘ACTIVE_TRANSACTION’ AND database_id = <your database id>)

–Find out the user and which query is holding it open
SELECT  s.SPID,S.OPEN_TRAN,TEXT,s.Hostname,s.nt_domain,nt_username,net_address,s.loginame,
s.login_time, s.last_batch, s.status, s.sid, s.hostname as [Host_Name],
s.program_name, s.cmd, s.net_library,  GETDATE() as InputDate
FROM SYS.SYSPROCESSES S CROSS APPLY SYS.DM_EXEC_SQL_TEXT(S.SQL_HANDLE)AS A
WHERE OPEN_TRAN<>0

The really nice thing about this query is that it returns the Text fo the actual query which is holding the log open, as well as program_name which is running it and the Host_Name.

image

I set up a SQL Server Agent job to run the package every 5 minutes, since I only wanted to capture items which were long running. 

Using this package I was able to very quickly find out that there was an automated process on SharePoint which was running every hour but for whatever reason was never able to complete (that’s another story). Because the transaction didn’t successfully complete it remained flagged as Active, the Checkpoint for the log was not removed and the log just kept growing and was never truncated.  Once the issue with the SharePoint job was resolved the log file was able to resume normal behaviour and Truncate on Checkpoint as expected.