AWS Data pipelines are used to move data between different storages. In this article, we discuss how to deploy data pipelines in different regions and how the settings are different from region to region. The use-case of the data pipeline in this article is to retrieve data from an SQL server and store them into AWS Dynamodb tables. This approach is not a direct path to achieve. The following are the steps to be done, 1. SQL to S3 bucket — CopyActivity 2. S3 bucket to Dynamodb — HiveActivity First, the data needs to be taken from the SQL server to S3 server in a CSV format which is known as a CopyActivity. Secondly, data needs to be sent to our Dynamodb tables from the S3 buckets which is known as a HiveActivity. When creating two activities we need to specify resources for the two activities. These resources should be specified for the target region of the Dynamodb during the HiveActivity. Here’s an overall architecture. Let’s see how the cross-region works with each scenario below: : Deploy S3, Dynamodb & Pipeline on the same region Ireland region Scenario 1 After deploying the data pipeline, the pipeline worked well. With no region being specified with the Dynamodb tables or in EMR clusters. Since the data pipeline is running on the same region as s3 and Dynamodb, the default values are configured. Therefore the pipeline works smoothly. : Deploy S3 & Dynamodb in N. Virginia and Data Pipeline in Ireland regions Scenario 2 When a cross-region data transfer occurs, for S3 it will not be an issue if the region is not specified to the EC2 instance created because s3 bucket names are unique. But for Dynamodb, the region needs to be specified and the EMR cluster region needs to be specified. As for the settings of data pipeline EMR resource, m1.medium is defined as the core instance type and EMR release label is given as 4.4.0 with the region specified as Frankfurt. If the region is not specified for the EMR cluster, the data pipeline will run successfully but data is not sent to Dynamodb in the N. Virginia region. : Deploy S3 & Dynamodb in Frankfurt and Data Pipeline in Ireland regions Scenario 3 After deploying S3 & Dynamodb to Frankfurt region, there are few things that needs to be considered since the region was after 2014, does not support some of the configurations. Following configurations needs to be concerned if you are moving to a region which supports new technologies. 1. Instance type for EMR 2. AMI version 3. Reading logs & data from S3 As for the instance type, and . When it comes to EMR supported instances in the region there, in this article m4.large instance type is selected. And the . previous generation instances are not supported with the region versions needed to be updated AMI version needs to be 5.13.0 or later When reading the logs, it will not support from the data pipeline, due to an AWS Signature 4 error (AWS4-HMAC-SHA 25). To avoid the issue, create another S3 bucket in another region such as Ireland. And the logs are directed to another region would simply solve the issue. Furthermore, by not defining the region to the EMR, during the HiveActivity, it would fail by giving the error. If you run into any errors, please drop a comment and thank you for reading. AWS4-HMAC-SHA 25

How AWS data pipeline configurations are different from each region

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Performance increase of Data Pipelines from S3 to Dynamodb

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Performance increase of Data Pipelines from S3 to Dynamodb

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps