-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Amazon Redshift Cookbook
By :

Scalar UDF functions in Amazon Redshift are routines that are able to take parameters, perform calculations, and return the results. UDFs are handy when performing complex calculations that can be stored and reused in a SQL statement. Amazon Redshift supports UDFs that can be authored using either Python or SQL. In addition, Amazon Redshift also supports AWS Lambda UDFs that open up further possibilities to invoke other AWS services. For example, let's say the latest customer address information is stored in AWS DynamoDB—you can invoke an AWS Lambda UDF to retrieve this using a SQL statement in Amazon Redshift.
To complete this recipe, you will need the following:
In this recipe, we will start with a scalar Python-based UDF that will be used to parse an XML input:
f_parse_xml
function:CREATE OR REPLACE FUNCTION f_parse_xml (xml VARCHAR(MAX), input_rank int) RETURNS varchar(max) STABLE AS $$ import xml.etree.ElementTree as ET root = ET.fromstring(xml) res = '' for country in root.findall('country'): rank = country.find('rank').text if rank == input_rank: res = name = country.get('name') + ':' + rank break return res $$ LANGUAGE plpythonu;
Important note
The preceding Python-based UDF takes in the XML data and uses the xml.etree.ElementTree
library to parse it to locate an element, using the input rank. See https://docs.python.org/3/library/xml.etree.elementtree.html for more options that are available with this XML library.
f_parse_xml
function using the following statement, by locating the country name that has the rank 2:select f_parse_xml('<data> <country name="Liechtenstein"> <rank>2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country></data>', '2') as col1
This is the expected output:
col1 Liechtenstein:2
Figure 2.1 – Creating a Lambda function using the AWS Management Console
rs_lambda
under Function name, choose a Python 3.6 runtime, and click on Create function. import json def lambda_handler(event, context): ret = dict() ret['success'] = True ret['results'] = ["bar"] ret['error_msg'] = "none" ret['num_records'] = 1 return json.dumps(ret)
In the preceding Python-based Lambda function, a sample result is returned. This function can further be integrated to call any other AWS service—for example, you can invoke AWS Key Management Service (KMS) to encrypt input data.
RSInvokeLambda
, using the following policy statement by replacing [Your_AWS_Account_Number], [Your_AWS_Region]
with your AWS account number/region and attaching the role to the Amazon Redshift cluster:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "arn:aws:lambda:[Your_AWS_Region]: [Your_AWS_Account_Number]:function:rs_lambda" } ] }
f_redshift_lambda
function that links the AWS Lambda rs_lambda
function:CREATE OR REPLACE EXTERNAL FUNCTION f_redshift_lambda (bar varchar) RETURNS varchar STABLE LAMBDA 'rs_lambda' IAM_ROLE 'arn:aws:iam::[Your_AWS_Account_Number]:role/RSInvokeLambda';
f_redshift_lambda
function by using the following SQL statement:select f_redshift_lambda ('input_str') as col1 --output col1 bar
Amazon Redshift is now able to invoke the AWS Lambda function using a SQL statement.
Amazon Redshift allows you to create a scalar UDF using either a SQL SELECT
clause or a Python program in addition to the AWS Lambda UDF illustrated in this recipe. The scalar UDFs are stored with Amazon Redshift and are available to any user when granted the required access. You can find a collection of several ready-to-use UDFs that can be used to implement some of the complex reusable logic within a SQL statement at the following link: https://github.com/aws-samples/amazon-redshift-udfs.