federated queries from postgres to redshift without enhanced vpc routing
RedShift boasts the abililty to create an external schema that serves as a connection to RDS instances. I'm briefly going to cover resources that exist for creating Postgres -> RedShift federated queries without using enhanced VPC routing.
Resources
- AWS Documentation on Federated Queries
- Creating external schemas in RedShift
- AWS Federated Query examples
- Blog post with similar architecture discussion
- AWS Documentation on VPC Security Groups
- AWS Documentation on Enhanced VPC Routing
- AWS Documentation on Secrets Manager
- AWS Documentation on managing RedShift security groups
General overview
Our architecture will require either:
- RDS and RedShift to exist within the same VPC
- There to be a NAT that exists between the separate VPCs
We'll be discussing same VPC instances. Our architecture looks like the following:
Implementation of the federated query
The implementation of this service is broken out below.
Prerequisites:
- An existing user in Postgres with access to what you want in RedShift
- An IAM role that RedShift can use to access the instance
- A secret containing the Postgres user/password in the AWS Secrets Manager
Implementation
- Adding our external schema to the RedShift cluster will provide us access to the instance.
This query is directly pulled from the AWS Docs
-- Runs in RedShift
CREATE EXTERNAL SCHEMA my_redshift_schema_name
FROM POSTGRES
DATABASE 'my_postgres_database_name' SCHEMA 'my_postgres_schema'
URI 'endpoint to aurora postgres hostname'
IAM_ROLE 'arn:aws:iam::123456789012:role/my/redshift/user/role/'
SECRET_ARN 'arn:aws:secretsmanager:us-west-2:123456789012:secret:federation/test/dataplane-apg-creds-YbVKQw';
- Verify that the schema now exists in RedShift
- Likely we'll run into an issue with there being timeouts if we attempt to connect to a table like below
Check the table returns values from Postgres:
select *
from my_redshift_schema_name.some_table
Producing something like:
SQL Error: Invalid Operation: Failed to incorporate external table
into local catalog. Error= Unable to connect to endpoint with address xxxxxxx"
- Add RedShift to the security group of our instance, this will now change our architecture to the following
- The query we ran to check the RedShift returns from Postgres should now work
Check the table returns values from Postgres:
select *
from my_redshift_schema_name.some_table