Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Amazon Redshift Cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
Amazon Redshift Cookbook

Amazon Redshift Cookbook

By : Shruti Worlikar, Arumugam, Patel
4.8 (9)
close
close
Amazon Redshift Cookbook

Amazon Redshift Cookbook

4.8 (9)
By: Shruti Worlikar, Arumugam, Patel

Overview of this book

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift. You'll then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshift's columnar architecture and managed services. As you advance, you’ll discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you'll gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform. By the end of this Redshift book, you'll be able to implement a Redshift-based data analytics solution and have understood the best practice solutions to commonly faced problems.
Table of Contents (13 chapters)
close
close

Managing UDFs

Scalar UDF functions in Amazon Redshift are routines that are able to take parameters, perform calculations, and return the results. UDFs are handy when performing complex calculations that can be stored and reused in a SQL statement. Amazon Redshift supports UDFs that can be authored using either Python or SQL. In addition, Amazon Redshift also supports AWS Lambda UDFs that open up further possibilities to invoke other AWS services. For example, let's say the latest customer address information is stored in AWS DynamoDB—you can invoke an AWS Lambda UDF to retrieve this using a SQL statement in Amazon Redshift.

Getting ready

To complete this recipe, you will need the following:

  • Access to the AWS console
  • Access to any SQL interface such as a SQL client or query editor
  • Access to create an AWS Lambda function
  • Access to create an Identity and Access Management (IAM) role that can invoke AWS Lambda and attach it to Amazon Redshift

How to do it…

In this recipe, we will start with a scalar Python-based UDF that will be used to parse an XML input:

  1. Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create an f_parse_xml function:
    CREATE OR REPLACE FUNCTION f_parse_xml
    (xml VARCHAR(MAX), input_rank int)
    RETURNS varchar(max)
    STABLE
    AS $$
        import xml.etree.ElementTree as ET
        root = ET.fromstring(xml)
        res = ''
        for country in root.findall('country'):
            rank = country.find('rank').text
            if rank == input_rank:
               res =  name = country.get('name') + ':' + rank
               break
        return res 
    $$ LANGUAGE plpythonu;

    Important note

    The preceding Python-based UDF takes in the XML data and uses the xml.etree.ElementTree library to parse it to locate an element, using the input rank. See https://docs.python.org/3/library/xml.etree.elementtree.html for more options that are available with this XML library.

  2. Now, let's validate the f_parse_xml function using the following statement, by locating the country name that has the rank 2:
    select 
    f_parse_xml('<data>      <country name="Liechtenstein">        <rank>2</rank>         <year>2008</year>         <gdppc>141100</gdppc>         <neighbor name="Austria" direction="E"/>        <neighbor name="Switzerland" direction="W"/> </country></data>', '2') as col1

    This is the expected output:

    col1
    Liechtenstein:2
  3. We will now create another AWS Lambda-based UDF. Navigate to the AWS Management Console and pick the AWS Lambda service and click on Create function, as shown in the following screenshot:
    Figure 2.1 – Creating a Lambda function using the AWS Management Console

    Figure 2.1 – Creating a Lambda function using the AWS Management Console

  4. In the Create function screen, enter rs_lambda under Function name, choose a Python 3.6 runtime, and click on Create function.
  5. Under the Function code textbox, copy and paste the following code and press the Deploy button:
    import json
    def lambda_handler(event, context):
        ret = dict()
        ret['success'] = True
        ret['results'] = ["bar"]
        ret['error_msg'] = "none"
        ret['num_records'] = 1
        return json.dumps(ret)

    In the preceding Python-based Lambda function, a sample result is returned. This function can further be integrated to call any other AWS service—for example, you can invoke AWS Key Management Service (KMS) to encrypt input data.

  6. Navigate to AWS IAM in the AWS Management Console and create a new role, RSInvokeLambda, using the following policy statement by replacing [Your_AWS_Account_Number], [Your_AWS_Region] with your AWS account number/region and attaching the role to the Amazon Redshift cluster:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "lambda:InvokeFunction",
                "Resource": "arn:aws:lambda:[Your_AWS_Region]: [Your_AWS_Account_Number]:function:rs_lambda"
            }
        ]
    }
  7. Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create a f_redshift_lambda function that links the AWS Lambda rs_lambda function:
    CREATE OR REPLACE EXTERNAL FUNCTION f_redshift_lambda (bar varchar)
    RETURNS varchar STABLE
    LAMBDA 'rs_lambda'
    IAM_ROLE 'arn:aws:iam::[Your_AWS_Account_Number]:role/RSInvokeLambda';
  8. You can validate the f_redshift_lambda function by using the following SQL statement:
    select f_redshift_lambda ('input_str') as col1
    --output
    col1
    bar

Amazon Redshift is now able to invoke the AWS Lambda function using a SQL statement.

How it works…

Amazon Redshift allows you to create a scalar UDF using either a SQL SELECT clause or a Python program in addition to the AWS Lambda UDF illustrated in this recipe. The scalar UDFs are stored with Amazon Redshift and are available to any user when granted the required access. You can find a collection of several ready-to-use UDFs that can be used to implement some of the complex reusable logic within a SQL statement at the following link: https://github.com/aws-samples/amazon-redshift-udfs.

bookmark search playlist download font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY