CLI tool to Query the AWS Aurora Serverless Data API
The AWS Aurora Serverless Data API allows you to query an Aurora Serverless database via an HTTP interface, and specifically, to do so outside the VPC of the database. This is great for Lambda functions and also avoids DB connection management. Additionally, it makes for an easy way to do simple queries of your DB from the command line or similar.
I recently converted a Lambda function (written in Go) from making direct Aurora Postgres SQL calls, to using the AWS Data API. The idea is that this avoids database connection management, and allows moving your Lambda function outside the VPC. The advantage of that is now you don’t need a NAT gateway in the VPC to allow the lambda to make outside network calls (e.g. this lambda also calls out to a third party HTTP API). NAT gateways have an approximately $30/month base cost (plus the cost of data transfer). Furthermore, this reduces the complexity of your configuration (assuming you don’t need the NAT gateway, etc. for other things). For example, this avoids around 140 lines of CloudFormation in my serverless.yml to set up the VPC subnets and NAT gateway to support a Lambda in a VPC.
The Data API itself is an HTTP interface to your database, and returns JSON that contains the result from your SQL call, as well as a various metadata (which you can exclude), such as column names, and data types, etc. Using the AWS SDK’s for any of their supported languages, it is easy to use via the execute_statement() function among others. Transactions are possible as well.
CLI Tool (quick & dirty Python script :)
I found that I wanted a way to do ad-hoc queries to my databases, and the Data API actually made this quite easy compared to some of the security setup, or use of an EC2 box, or tunnel, etc. Below is a simple Python script that performs these queries from the command line. It reads a YAML config file that defines each of the database(s) you have, AWS profile to use, and the required auth info. This allows you to then do something like:
querydataapi.py mydevdb “SELECT * FROM sometable LIMIT 5”
All the script does is make the SQL call, parse the metadata for the column names, and then shove the results into a Pandas data frame to make the output look nice. This is obviously not fancy, but for quick little ad-hoc queries, it has served my needs:
Finally, two notes.
- if you don’t use profiles for AWS, you could remove the line that creates the Session (then it will read your default profile/AWS key from the ~/.aws/credentials file). Or, you could remove that, and then prefix the command with AWS_PROFILE=someprofile to specify that on the command line, etc.
- If the results of your SQL call have a lot of columns (i.e. wider than your terminal window), then the data will be elided. You could print the dataframe other ways (see the tabulate package for example), or use the to_csv method on the dataframe to convert it to CSV, etc.