In “How SQL can unify access to APIs” I made the case for SQL as a common environment in which to reason about data flowing from many different APIs. The key enabler of that scenario is Steampipe, a Postgres-based tool with a growing suite of API plugins that map APIs to foreign tables in Postgres.
The Steampipe ecosystem then expanded with plugins for many other services including GitHub, Google Workspace, IMAP, Jira, LDAP, Shodan, Slack, Stripe, and Zendesk. Joining across these APIs is a superpower best proven by this example that joins Amazon EC2 endpoints with Shodan vulnerabilities in just 10 lines of very basic SQL.
select a.instance_id, s.ports s.vulns from aws_ec2_instance a left join shodan_host s on a.public_ip_address = s.ip where a.public_ip_address is not null; +---------------------+----------+--------------------+ | instance_id | ports | vulns | +---------------------+----------+--------------------+ | i-0dc60dd191cb84239 | null | null | | i-042a51a815773780d | [80,22] | null | | i-00cf426db9b8a58b6 |  | null | | i-0e97f373db42dfa3f | [22,111] | ["CVE-2018-15919"] | +---------------------+----------+--------------------+
Files are APIs too
But what is an API, really? Must it always entail HTTP requests to service endpoints? More broadly APIs are data sources that come in other flavors too. Web pages are often, still, de facto APIs. I’ve done more web scraping than I care to think about over the years and the skill remains useful.
Files are also data sources: configuration files (INI, YAML, JSON), infrastructure-as-code files (Terraform, CloudFormation), data files (CSV). When plugins for these sources began to join the mix, Steampipe became even more powerful.
First came the CSV plugin, which unlocked all sorts of useful queries. Consider, for example, how we often pretend spreadsheets are databases. In doing so we can assume there’s referential integrity when really there isn’t. If you export spreadsheet data to CSV, you can use SQL to find those flawed assumptions. And that’s just one of the endless ways I can imagine using SQL to query the world’s leading file format for data exchange.
Then came the Terraform plugin, which queries Terraform files to ask and answer questions like: “Which trails are not encrypted?”
select name, path from terraform_resource where type="aws_cloudtrail" and arguments -> 'kms_key_id' is null;
Using the AWS plugin’s aws_cloudtrail_trail table, we can ask and answer the same question for deployed infrastructure, and return a result set that you could UNION with the first one.
select name, arn as path from aws_cloudtrail_trail where kms_key_id is null;
Ideally the answers will always be the same. What you said should be deployed, using Terraform, should match what’s actually deployed if you query AWS APIs. In the real world, of course, maintenance and/or incident response can result in configuration drift. Given a common way to reason over defined and deployed infrastructure, we can manage such drift programmatically.
Belt and suspenders
For deployed infrastucture, Steampipe has long provided a suite of mods that layer security and compliance checks onto API-derived foreign tables. The AWS Compliance mod, for example, provides benchmarks and controls to check deployed infrastructure against eleven standards and frameworks including CIS, GDPR, HIPAA, NIST 800-53, and SOC 2.
With the advent of the Terraform plugin it became possible to create complementary mods, like Terraform AWS Compliance, that provide the same kinds of checks for defined infrastructure.
Does what you defined last month match what you deployed yesterday? A satisfactory answer requires the ability to reason over defined and deployed infrastructure in a common and frictionless way. SQL can’t remove all the friction but it’s a powerful solvent.
Copyright © 2022 IDG Communications, Inc.