Can We Encrypt Data With Bitlocker And Then Upload To The Aws S3?

This article explains how to develop ETL (Extract Transform Load) jobs using AWS Gum to load data from AWS S3 into AWS RDS SQL Server database objects.

Introduction

ETL is one of the widely-used methods for data integration in any enterprise IT landscape. Data is transported from source to destination data repositories using ETL jobs. Enterprises host product workloads on AWS RDS SQL Server instances on the deject. Data is often load in and out of these instances using different types of ETL tools. One of the AWS services that provide ETL functionality is AWS Glue. AWS S3 is the primary storage layer for AWS Data Lake. Ofttimes semi-structured information in the class of CSV, JSON, AVRO, Parquet and other file-formats hosted on S3 is loaded into Amazon RDS SQL Server database instances. In this commodity, we will explore the process of creating ETL jobs using AWS Gum to load data from Amazon S3 to an Amazon RDS SQL Server database instance.

AWS RDS SQL Server Case

It's assumed that an operational instance of AWS RDS SQL Server is already in identify. Once the instance is available, it would look every bit shown below. For the ETL job that we would be developing in this article, we demand a source and a target data repository. Amazon S3 would deed as the source and the SQL Server database example would deed every bit the destination. Even a SQL Server Express edition hosted on SQL Server instance will work. Do ensure, that yous have the required permissions to manage an AWS S3 saucepan as well as the SQL Server database case.

SQL Server instance

Setting upwardly an AWS S3 Bucket with sample data

Navigate to the AWS S3 home page, by typing S3 on the AWS Console home page and and so open the selected service. From the Amazon S3 home page, click on the Create Bucket button to create a new AWS S3 bucket. Provide a relevant name and create the bucket in the aforementioned region where you have hosted your AWS RDS SQL Server instance. Create a sample CSV file as shown below and add some sample data to it. In our example, we take a sample file named employees that has two fields and a few records as shown below.

Sample CSV file

Once the file has been created, upload it to the newly created S3 bucket by clicking on the Upload push button in the AWS S3 Bucket interface. Once you lot have uploaded the file successfully, it would look as shown below. This completes the creation of our source information setup.

Amazon S3 bucket with csv file

AWS RDS SQL Server database objects setup

Log on to the AWS RDS SQL Server database case using the editor of your choice. Once you accept connected to the instance, create a tabular array that matches the schema of the CSV file that we but created. The schema can be different besides, in that instance, nosotros will take to perform a transformation for the source information to load into the target table. To keep the transformation complexity minimum and so that we focus on the configuration of the ETL job, hither a table is created that has identical schema to the CSV file as shown below.

SQL Server table

Crawling AWS S3 files and AWS RDS SQL Server tables

We learned how to clamber SQL Server tables using AWS Glue in my last article. In the aforementioned way, we need to catalog our employee table equally well as the CSV file in the AWS S3 bucket. The only difference in itch files hosted in Amazon S3 is the data store type is S3 and the include path is the path to the Amazon S3 saucepan which hosts all the files. Later on all the Amazon S3 hosted file and the table hosted in SQL Server is a crawler and cataloged using AWS Glue, it would look as shown below.

AWS Glue Catalog with metadata tables

We learned how a cataloged AWS RDS SQL Server table would await in my last commodity, How to catalog AWS RDS SQL Server database. For reference, a cataloged AWS S3 bucket-based table, for a file having a schema like the CSV that we created earlier, would look equally shown below.

AWS Glue table

Developing an AWS Glue ETL Job

Now that the AWS Gum itemize is updated with source and destination metadata tables, now we can create the ETL job. Navigate to the ETL Jobs section from the left pane, and it would look as shown beneath.

AWS Glue ETL Jobs

Click on the Add together task push button to start creating a new ETL task. A new wizard would start, and the beginning stride would wait as shown below. Provide a relevant proper name and an IAM role (with privileges to read and write on the metadata itemize also as AWS S3) for the job. The blazon of task provides options to either create a Spark-based chore or a Python trounce job. Spark-based jobs are more characteristic-rich and provide more than options to perform sophisticated ETL programming compared to Python beat out jobs, and also support all the AWS Glue features with Python-based jobs do non. After nosotros are done specifying all the options, the output of this task would be a script that is generated by AWS Glue. Alternatively, you tin configure this job to execute a script that you already take in identify or you wish to writer. We are going to get out the task type and script related settings to default.

AWS Glue ETL Job Config

Other options on this page will look every bit shown beneath. These options tin be used to mainly configure any custom scripts or libraries, security, monitoring and logging, which is not required in our case for the purpose of demonstration. We tin can proceed with the next stride.

AWS Glue ETL Job Config

In this step, nosotros need to select the data source, which is the table from the metadata itemize that points to the S3 bucket. Select the relevant table as shown below and click Side by side.

AWS Glue ETL Job Config

Nosotros need to select a transformation blazon in this step. Even tin select not to make whatsoever transformation in the next footstep, but at this step, we need to either changing of schema or observe matching records (for deduplication) every bit one of the transform types. Select Change schema and click Side by side.

AWS Glue ETL Job Config

Now nosotros demand to select the destination metadata tabular array that points to our AWS RDS SQL Server table. Select the relevant tabular array as shown below and click Next.

AWS Glue ETL Job Config

In this step, we can make the required changes to the mapping and schema if required. Every bit we practice not demand to make any changes to the mapping, we can click on the Save job and edit script button.

AWS Glue ETL Job Config

This would take united states to the python script generated past this job for the ETL as per the specifications that we provided for the task. Now that our job is ready, we can click on the Run Job button to execute the ETL task.

AWS Glue ETL Job Config

A prompt is shown earlier executing the job, where we can modify runtime parameters if required. Nosotros practice non need to change any parameters in our example, then click on the Run job button. This would start the execution of our ETL job.

AWS Glue ETL Job Config

One time the chore execution starts, you can select the task and information technology would evidence united states of america the status of the job in the bottom pane as shown below. It can accept a few minutes to kickoff the chore, as it warms up the spark environs in the background to execute the job. Once the job execution is complete, information technology would provide the details of the job execution in the bottom pane as shown beneath.

AWS Glue ETL Job Execution

Now that the job has completed execution, if the job worked equally expected, all the ten records that we have in our CSV file that is hosted on AWS S3, the same records should have got loaded into AWS RDS SQL Server table that we created earlier. Navigate to a query editor and query the SQL Server table. You lot should be able to see all those records in the tabular array equally shown below.

SQL Server table

In this way, we can use AWS Gum ETL jobs to load data into Amazon RDS SQL Server database tables.

Decision

In this article, we learned how to use AWS Glue ETL jobs to extract data from file-based data sources hosted in AWS S3, and transform as well as load the same information using AWS Mucilage ETL jobs into the AWS RDS SQL Server database. Nosotros also learned the details of configuring the ETL chore too as pre-requisites for the job like metadata tables in the AWS Glue metadata catalog.

Getting started with AWS Redshift

Access AWS Redshift from a locally installed IDE

How to connect AWS RDS SQL Server with AWS Glue

How to itemize AWS RDS SQL Server databases

Bankroll up AWS RDS SQL Server databases with AWS Fill-in

Load information from AWS S3 to AWS RDS SQL Server databases using AWS Mucilage

Load information into AWS Redshift from AWS S3

Managing snapshots in AWS Redshift clusters

Share AWS Redshift data across accounts

Consign data from AWS Redshift to AWS S3

Restore tables in AWS Redshift clusters

Getting started with AWS RDS Aurora DB Clusters

Saving AWS Redshift costs with scheduled interruption and resume actions

How to create an AWS SageMaker Instance

Import data into Azure SQL database from AWS Redshift

Author
Contempo Posts

Rahul Mehta is a Software Architect with Capgemini focusing on deject-enabled solutions. He works on various deject-based technologies like AWS, Azure, and others.

He has worked internationally with Fortune 500 clients in various sectors and is a passionate author.

View all posts by Rahul Mehta