
Azure Data Factory Cookbook
By :

In this recipe, we shall create a pipeline that fetches some metadata from an Azure storage container and stores it in an Azure SQL database table. You will work with two frequently used activities, the Metadata activity and the Stored Procedure activity.
AzureSqlDatabase
and OrchestrationAzureBlobStorage
linked services in this recipe as well, so if you did not create them before, please go through the necessary steps in the previous recipe.AzureSQLDatabase
. If you haven’t done so already, create the FileMetadata
table and the stored procedure to insert the data as described in the Technical requirements section of this chapter.pl_orchestration_recipe_2
.CsvDataFolder
, pointing to the Azure Storage container (adforchestrationstorage
) we specified in the Technical requirements section. Use the delimited text file format. This time, do not specify the filename; leave it pointing to the data container itself. Use the same linked service for Azure Blob Storage as we used in the previous recipe.CsvDataFolder Metadata
.CsvDataFolder
dataset. In the same tab, under Field list, use the New button to add two fields, and select Item Name and Last Modified as the values for those fields:
Figure 2.16: Get Metadata activity configuration
Insert Metadata
.AzureSqlDatabase
) and the name of the stored procedure: [dbo].[InsertFileMetadata]
.@activity('CsvDataFolder Metadata').output.itemName
@convertFromUtc(activity('CsvDataFolder Metadata').output.lastModified, 'Pacific Standard Time')
@convertFromUtc(utcnow(), 'Pacific Standard Time')
:
Figure 2.17: Stored Procedure activity configuration
.csv
files.In this simple recipe, we introduced two new activities. In step 2, we have used the Metadata activity, with the dataset representing a folder in our container. In this step, we were only interested in the item name and the last-modified date of the folder. In step 3, we added a Stored Procedure activity, which allows us to directly invoke a stored procedure in the remote database. In order to configure the Stored Procedure activity, we needed to obtain the parameters (itemName
, lastModified
, and UpdatedAt
). The formulas used in step 5 (such as @activity('CsvDataFolder Metadata').output.itemName
) define which activity the value is coming from (the CsvDataFolder
Metadata activity) and which parts of the output are required (output.itemName
). We have used the built-in convertFromUtc
conversion function in order to present the time in a specific time zone (Pacific Standard Time, in our case).
In this recipe, we only specified the itemName
and lastModified
fields as the metadata outputs. However, the Metadata activity supports many more options. Here is the list of currently supported options from the Data Factory documentation at https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity#capabilities:
Figure 2.18: Metadata activity options
The Metadata type options that are available to you will depend on the dataset: for example, the contentMD5
option is only available for files, while childItems
is only available for folders.