Netography AWS Onboarding Guide for Cloud Automation Engineers
Introduction
If you have not yet reviewed the options for how to onboard AWS VPC flow logs to Fusion, see: Automating AWS Cloud Onboarding. If you have done so and determined that you will be integrating the Netography Fusion AWS configuration requirements and VPC flow log onboarding process to your existing automation, this page provides the technical details and working examples needed for a cloud automation engineer to complete this integration.
This guide covers VPC flow logs configured to write to S3. Netography also supports using Kinesis to deliver flow logs to Netography. Kinesis provides a lower latency approach to delivering flow logs to Fusion, but it comes at a significantly higher cost from AWS. As a result, most Netography customers use the S3 method, and that is what is documented here.
To skip right to the CloudFormation deployment: https://docs.netography.com/docs/netography-aws-cloudformation-automation
AWS VPC Flow Log Configuration Steps (via S3)
1. Create a S3 bucket to Write VPC Flow Logs to
S3 Bucket Region Recommendations
We recommend creating a S3 bucket for each region you have VPCs in, and directing the flow logs for each VPC to a S3 bucket in the same region. This minimizes write latency and cost.
If you want to use a single S3 bucket instead for simplicity, we recommend creating the S3 bucket in us-east-1
. This minimizes read latency and cost, at the expense of write latency.
The configuration to avoid is having a single S3 bucket in a region that is NOT us-east-1
. This will double your cross-region data transfer cost. This is a supported configuration, just not optimized for cost.
Why regional alignment of S3 bucket(s) matter
Although you can write VPC Flow Logs to a bucket in any region, co-locating the bucket with either the VPC or Netography’s ingest endpoint (
us-east-1
) minimizes both latency and data-transfer fees:
- Latency
- Same-region writes (co-located with the VPC) or reads (
us-east-1
) avoid extra network hops.- Cost
- Intra-region data transfers are free.
- Cross-region writes (e.g. eu-west-1 → us-east-1) incur standard AWS cross region data transfer fees.
- Cross-region reads (S3 → us-east-1) likewise incur cross region data transfer fees unless the bucket lives in
us-east-1
where Netography ingest reads from.
Deployment patterns
Bucket Location Cross-Region Write Cross-Region Read Notes Same as VPC No Yes, unless VPC is in us-east-1
Best for low-latency writes; one cross-region hop. us-east-1
(Fusion ingest)Yes, unless VPC is in us-east-1
No One cross-region hop on write; zero on read. Neither same as VPC nor us-east-1 Yes Yes Two cross-region hops—write and read—double the cross region data transfer fees.
1. IAM Policy and Custom Role for Netography to read flow logs from your S3
In order for Netography to automatically ingest flow logs from AWS, it needs to have permissions to fetch objects from S3.
Option 1: CloudFormation Stack
To make this process easier, we've provided the netography-base.yaml
CloudFormation template, which can be found along with detailed instructions here: https://docs.netography.com/docs/netography-aws-cloudformation-automation#1-iam-policy-and-custom-role-for-netography. To deploy only the IAM roles, follow step 1 - only deploying the roles with no lambdas.
Option 2: Manual Deployment
If you would rather navigate through this process manually, see the documentation below.
AWS VPC via S3 Setup (AWS Console method)
AWS VPC via S3 Setup (CloudFormation method)
Additionally, if you would like to use a SNS topic or SQS queue, these steps from the AWS Quickstart Guide also provide some additional step by step instructions:
Note: These are optional. You can omit the optional SQS/SNS related permissions. Using SQS/SNS provides a trigger based mechanism for Netography to read new flow log files as they are written to S3. However, the latency difference may not be that critical to your use case, and so if you are building out your own automation you can simplify things by not using the optional SQS/SNS triggers. The examples shown below also omit these for simplicity.
3. Enabling VPC Flow Logs and Onboarding to Fusion
There are 2 steps for ingesting VPC flow logs to Fusion that must be completed for each VPC:
- Enable VPC flow logs in AWS for the VPC, specifying a log destination in S3 to write the flow logs to. Refer to the next section for details on supported S3 log destinations.
- Ensure Fusion has a traffic source configured to read the flow logs:
- If each VPC flow log is configured with a unique S3 log destination (e.g. a folder or prefix is added to the S3 bucket path to differentiate it from other VPC flow logs in the same S3 bucket at the top-level directory), create a AWS VPC S3 traffic source in Fusion for each VPC.
- If the same S3 log destination is used for all the VPC flow log configurations, create a AWS VPC S3 traffic source in Fusion for each unique account and region that is writing flow logs to that destinaton.
AWS Flow Log Configuration Requirements
To see the exact configuration required for a single AWS VPC flow log configuration, refer to one of these documentation links:
- Quickstart: AWS
- AWS VPC via S3 Setup (AWS Console method)
- AWS VPC via S3 Setup (CloudFormation method)
S3 Log Destination
Using a single S3 bucket for multiple VPC flow log configurations across multiple VPC and accounts IS SUPPORTED by Fusion. When using a single bucket, there is a design choice to make between specifying a unique top-level folder in the S3 bucket (also called a prefix, as it is added before AWS writes to AWSLogs
), or using the exact same log destination for all VPCs.
Explaining the relationship between Fusion traffic sources and AWS VPC flow log configurations
The Fusion traffic source configuration contains the S3 bucket, folder/prefix name, account ID, and region. It then constructs the exact S3 path where flow logs are being written for that traffic source.
When using the same log destination across all flow log configurations, that path will contain all the flow logs for the account and region in the traffic source configuration. Therefore, you must create 1 traffic source per account and region that is writing flow logs to the S3 bucket.
When using a unique log destination for each flow log configuration (whether that's a unique S3 bucket or a single S3 bucket with a unique folder/prefix), that path will contain the flow logs for ONLY that one flow log configuration. Therefore, you must create 1 traffic source per flow log configuration.
S3 Log Destination Option 1: Differentiate flow logs by folder/prefix in the same S3 bucket
You can differentiate each flow log configuration by adding a unique folder name after the S3 bucket ARN in the flow log configuration. If you do so, then you add a Fusion traffic source to correspond to each VPC (technically each VPC flow log configuration), each in its own folder.
This approach has the benefit of allowing you to uniquely configure settings on a per VPC basis within both AWS and Fusion, and maintain a 1:1 mapping between VPCs with flow logs and Fusion traffic sources.
Example: Assume you have vpc-1
and vpc-2
and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket
In the AWS flow log configuration for vpc-1
set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1
In the AWS flow log configuration for vpc-2
set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-2
AWS will then write flow logs for vpc-1
to acme-myflowlogs-bucket/flowlogs-for-vpc-1/AWSLogs/12345789012/vpcflowlogs/us-east-1/
and for vpc-2
to acme-myflowlogs-bucket/flowlogs-for-vpc-2/AWSLogs/12345789012/vpcflowlogs/us-east-1/
You will configure 2 Fusion traffic sources for vpc-1
and vpc-2
.
This differentiates the flow logs by path BEFORE
the AWSLogs
directory that AWS writes automatically. The directory structure under AWSLogs
changes depending on the hive prefix and hourly partitioning settings.
You do not need to use/include the VPC ID or name for that folder name as is done in this example - it can be any string that is unique across all the VPC flow logs pointing to that bucket.
S3 Log Destination Option 2: Use the same S3 log destination across accounts and VPCs
You can use the same S3 log destination across accounts and VPCs. If you do so, then you add a Fusion traffic source to correspond to each unique account and region combination writing flow logs to this log destination.
This has the benefit of being able to maintain a consistent S3 log destination across an AWS Organization.
Example: Assume you have vpc-1
and vpc-2
and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket
In the AWS flow log configuration for vpc-1
AND vpc-2
set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket
AWS will then write flow logs for vpc-1
to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/
and for vpc-2
to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/
(assuming the VPC are in same account and region; if not the number and region will differ).
You will configure a Fusion traffic source for account 123456789012
and region us-east-1
.
This differentiates the flow logs in different accounts and regions by the path AFTER
the AWSLogs
directory that AWS writes automatically. The directory structure under AWSLogs
changes depending on the hive prefix and hourly partitioning settings.
Limitations when using the same S3 log destination for multiple VPCs
Consistent VPC flow log configuration settingsIf you use the same S3 log destination, then you must ensure that all VPC flow logs configured to write to that log destination use the same configuration setting for these fields (you can choose any setting you want, it just must be the same).
- Log file format
- Hive-compatible S3 prefix setting
- Per Hour Partition Setting
Fusion configuration is per traffic source (per unique account/region), not per VPCFusion configuration settings like setting a sample rate and configuring tags operate at the traffic source level. When you use this approach, you will have 1 traffic source per unique account and region combination, so configuration operates at that level. If you need more granular control, such as setting a unique sample rate to an individual VPC, you need to use option 1 above (adding a prefix/folder and creating a traffic source per VPC) instead.
Scalability limitations for high volumesIf you will be delivering over 10,000 flow records per second to Netography across VPCs in a single account and region (pre-sampling), please reach out to Netography Support to discuss and agree on the right design for your environment, as there may be scalability reasons to distribute the ingest across traffic sources per VPC rather than per account/region.
3. AWS Context Integrations
In addition to ingesting VPC flow logs, Netography Fusion provides context enrichment for AWS through the use of Context Integrations.
Refer to AWS Context Integration for the additional permission requirements and options for context enrichment.
4. AWS Route 53 DNS Ingest
In addition to ingesting VPC flow logs, Netography Fusion also ingests Route 53 DNS resolver logs. Configuration and ingest for these logs follows a similar pattern to VPC flow logs. See AWS Route 53 DNS Logs via S3 Setup (Console) for more details on this support.
Automating Fusion Traffic Source Creation
The single VPC steps linked above show how to create a new traffic source for a VPC in the Fusion Portal. However, if you are ingesting numerous VPC, or flow log configuration occurs as part of an automation, you can also automate this step.
Traffic Source and Flow Source mean the same thing in Fusion
When Fusion only ingested flow logs, the term flow source was used, but since DNS was added to the product, not all sources are for flow, so the more generic term traffic source is now used. Some code and documentation may still refer to a flow source. They are the same thing and use the same API endpoints.
Automation for creating a single Fusion traffic source
Option 1: Use the Netography Fusion REST API
The API endpoints for creating, updating, and deleting a traffic source is documented here:
To create a new traffic source, you would construct a POST request with type aws
.
In the Fusion API, the S3 ARN you specified earlier in the AWS flow log configuration is broken out into 2 separate fields, bucket
, the ARN to the S3 bucket itself, and prefix
, which is the folder name you added to end of the S3 bucket ARN to separate the directories for this flow log configuration from others.
In the previous example, we configured the S3 ARN for vpc-1
to be arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1
. In the API call, set: "bucket": "acme-myflowlogs-bucket"
and "prefix": "flowlogs-for-vpc-1"
Authenticating to the API
Before you call the
vpc
API endpoint to create a traffic source, you must authenticate to the API, which will return a bearer token that you include in theauthorization
header in subsequent calls. For details, see:Authentication via API KeyYou can use the shell script provided in this recipe to perform the authentication and get the bearer token to use:
🔑curl: Authenticate to API using NETOSECRETOpen Recipe
curl --request POST \
--url https://api.netography.com/api/v1/vpc \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--header 'authorization: Bearer INSERT_JWT_BEARAR_TOKEN' \
--data '
{
"flowtype": "aws",
"flowresource": "s3",
"enabled": true,
"awsauthtype": "RoleARN",
"role": {
"arn": "ROLE_ARN"
},
"name": "FLOW_SOURCE_NAME",
"traffictype": "flow",
"bucket": "S3_BUCKET_NAME",
"bucketregion": "S3_BUCKET_REGION",
"prefix": "FOLDER_NAME_IF_APPLICABLE",
"region": "VPC_REGION",
"accountid": "VPC_ACCOUNT_ID",
"tags": [
"VPC_ID"
]
}
Here is the curl command with example values for the fields:
curl --location 'https://api.netography.com/api/v1/vpc' \
--header 'accept: application/json' \
--header 'authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlE1V1Z6Uk1CM1JxR0sxOEJDU2k3Yk1Gb0pNcUlHZTZCQlU0In0.eyJleHAiOjE3NTMwMjUyMDAsImlhdCI6MTc1MjkxODgwMCwianRpIjoiZTRlZTVjMzMtOTUzNS00YTk0LWE1ZDYtMmMwYjczNmI1MzFkIiwiaXNzIjoiaHR0cHM6Ly9hdXRoLmZha2UtcHJvZy5jb20vYXV0aC9yZWFsbXMiLCJhdWQiOiJhcGktY2xpZW50Iiwic3ViIjoiYWRtaW4tNzk4N2U2NzAtNzM1Ni00ODFhLTk5NzUtMTYyOTg5OGVkODJhIiwidHlwIjoiQmVhcmVy' \
--header 'content-type: application/json' \
--data '
{
"flowtype": "aws",
"flowresource": "s3",
"enabled": true,
"awsauthtype": "RoleARN",
"role": {
"arn": "arn:aws:iam::1234567890123:role/NetoFlowLogReader"
},
"name": "vpc-01",
"samplerate": 1,
"bucket": "neto-myflowlogs-bucket-example",
"bucketregion": "us-east-1",
"prefix": "myflowlogs-for-vpc-01",
"region": "us-east-1",
"accountid": "1234567890123",
"tags": [
"vpc-01"
]
}
Option 2: Use the neto
CLI tool
neto
CLI toolneto
is a Python-based CLI tool that serves as a front-end to the REST API. It can also be useful if you are developing your own Python code to use as a reference as it has working code for interacting with these API endpoints encapsulated into a Python class. This tool is in development and is currently available from TestPyPi as a Python pip
package.
neto
documentation and links to install the Python package are available at:
https://test.pypi.org/project/neto/
Here are examples of how you can use neto
from a shell:
> neto aws traffic create -h
usage: neto aws traffic create [-h] [--prefix PREFIX] [--name-prefix NAME_PREFIX] --vpcid VPCID --region REGION --accountid ACCOUNTID --rolearn ROLEARN --logbucket LOGBUCKET
options:
-h, --help show this help message and exit
--prefix PREFIX Folder prefix for the S3 logs. This is the path in the bucket where logs will be stored.
--name-prefix NAME_PREFIX
Prefix for the traffic source name
--vpcid VPCID VPC ID
--region REGION AWS region
--accountid ACCOUNTID
AWS account ID
--rolearn ROLEARN AWS role ARN
--logbucket LOGBUCKET
Log bucket
> neto aws traffic create --prefix /vpc-1234/ --name-prefix neto --vpcid vpc-1234 --region us-east-1 --accountid 123456789 --rolearn arn:aws:iam::123456789:role/NetoFlowLogReader --logbucket neto-one-for-all
INFO: neto v1.1.10 - Netography Fusion CLI tool
INFO: Using profile DEFAULT
INFO: Authenticating to Netography Fusion API for qa-customer at https://api.netography.com/api/v1
INFO: Successfully authenticated to Netography Fusion API for qa-customer at https://api.netography.com/api/v1
INFO: ++++ Successfully added flow source neto-123456789-vpc-1234 to Netography Fusion
> neto aws traffic delete -h
usage: neto aws traffic delete [-h] [--name-prefix NAME_PREFIX] --vpcid VPCID --accountid ACCOUNTID --region REGION
options:
-h, --help show this help message and exit
--name-prefix NAME_PREFIX
Traffic source name prefix
--vpcid VPCID VPC ID
--accountid ACCOUNTID
AWS account ID
--region REGION AWS region
> neto aws traffic delete --name-prefix neto --vpcid vpc-1234 --accountid 123456789 --region us-east-1
INFO: neto v1.1.10 - Netography Fusion CLI tool
INFO: Using profile DEFAULT
INFO: Authenticating to Netography Fusion API for qa-customer at https://api.netography.com/api/v1
INFO: Successfully authenticated to Netography Fusion API for qa-customer at https://api.netography.com/api/v1
INFO: Retrieved 2 Netography Fusion traffic sources for account qa-customer
INFO: Deleted Netography Fusion flow source 924227113
INFO: Deleted flow source for VPC vpc-1234
Option 3: Use Python
3a. Minimal Python code to interact with API
This recipe provides basic Python code you can call to authenticate to the API and create a traffic source.
3b. Instantiate a Python class to interact with Fusion API
This recipe provides a subset of the Python class, NetoAPI, that you can use to interact with the API. This is effectively the same thing as 3a, but separates out the API code.
Triggering Fusion Traffic Source Creation in AWS
The previous section covered how to create a single Fusion traffic source programatically. For building your own automation, the next step is to have whichever method you choose to use triggered when you create a new VPC flow log in AWS (and for the complete lifecycle, this should also cover when a flow log configuration is modified or deleted).
This example differentiates flow logs by prefix (S3 log destination option 1)
If your preferred design is to use a consistent S3 log destination (S3 log destination option 2 above), you will need to adjust the examples to omit the prefix and create a traffic source per account/region instead of per VPC.
Option 1. Lambda-backed custom resource in CloudFormation Stack
If you are using CloudFormation to create VPCs and/or configure flow logs, you can use a Lambda-backed custom resource to call a Python function as part of the CloudFormation stack. This will then be applied to all newly created VPCs, and will work regardless of how the CloudFormation stack itself is deployed. If you are using AWS Service Catalog to deploy a CloudFormation stack to create new VPCs and configure flow logs already, you can add the final step of creating the Fusion traffic source for the VPC with this approach.
More information on this stack, and detailed instructions for installation can be found here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Follow the instructions to install the Flow
feature. If you have already installed the base StackSet from the previous step, simply modify it when following the associated prerequisite steps.
The CloudFormation example is not meant to be used without modification
The example contains two CloudFormation templates that demonstrate how you can use CloudFormation to create a VPC, configure VPC flow logs, and onboard them to Netography Fusion. These templates are intended to provide an example for an engineer familiar with CloudFormation to integrate into their existing automation workflows.
If you are looking for a complete end-to-end solution that does not require CloudFormation expertise, consider using the Netography Cloud Onboarding Automation for AWS Organizations instead.
Option 2. Custom Lambda linked to EventBridge VPC Creation event
The example in the previous option includes a Lambda function that can be executed to create a new Fusion traffic source for a VPC. Instead of triggering that Lambda via a CloudFormation Lambda-backed custom resource, it can be triggered by an EventBridge VPC Creation event.
The 2 events to configure these triggers are:
vpc_create_event_pattern = {
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["ec2.amazonaws.com"],
"eventName": ["CreateVpc"],
},
}
delete_vpc_event_pattern = {
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["ec2.amazonaws.com"],
"eventName": ["DeleteVpc"],
},
}
Netography's Cloud Onboarding Automation for AWS Organizations uses this method, deploying a CloudFormation StackSet via Terraform. It serves as fully built out production quality example for how to configure AWS to trigger a Lambda for these events and Lambda code for performing the full lifecycle including deletions.
Option 3. Adapting a custom automation from Netography’s Cloud Onboarding Automation for AWS Organizations
If you are concerned about the overall complexity of Netography's full onboarding automation, even if you configure it to only execute a subset of its capabilities, it may still serve as a good working example of how to use CloudFormation, EventBridge, Lambda, and the Fusion API to build out your own automation.
Automating Fusion Context Creation
If you would like to automatically add context integrations for all accounts to see asset labels within Netography, you can follow the instructions for installing the Context
feature here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Be sure to modify the base StackSet as described.
Updated 3 days ago