Netography AWS Onboarding Guide for Cloud Automation Engineers
Introduction
If you have not yet reviewed the options for how to onboard AWS VPC flow logs to Fusion, see: Automating AWS Cloud Onboarding. If you have done so and determined that you will be integrating the Netography Fusion AWS configuration requirements and VPC flow log onboarding process to your existing automation, this page provides the technical details and working examples needed for a cloud automation engineer to complete this integration.
This guide covers VPC flow logs configured to write to S3. Netography also supports using Kinesis to deliver flow logs to Netography. Kinesis provides a lower latency approach to delivering flow logs to Fusion, but it comes at a significantly higher cost from AWS. As a result, most Netography customers use the S3 method, and that is what is documented here.
To skip right to the CloudFormation deployment: https://docs.netography.com/docs/netography-aws-cloudformation-automation
AWS VPC Flow Log Configuration Steps (via S3)
1. Create a S3 bucket to Write VPC Flow Logs to
S3 Bucket Region Recommendations
We recommend creating a S3 bucket for each region you have VPCs in, and directing the flow logs for each VPC to a S3 bucket in the same region. This minimizes write latency and cost.
If you want to use a single S3 bucket instead for simplicity, we recommend creating the S3 bucket in us-east-1. This minimizes read latency and cost, at the expense of write latency.
The configuration to avoid is having a single S3 bucket in a region that is NOT us-east-1. This will double your cross-region data transfer cost. This is a supported configuration, just not optimized for cost.
ℹ️ Why regional alignment of S3 bucket(s) matter
Although you can write VPC Flow Logs to a bucket in any region, co-locating the bucket with either the VPC or Netography’s ingest endpoint (us-east-1) minimizes both latency and data-transfer fees:
Latency
Same-region writes (co-located with the VPC) or reads (
us-east-1) avoid extra network hops.
Cost
Intra-region data transfers are free.
Cross-region writes (e.g. eu-west-1 → us-east-1) incur standard AWS cross region data transfer fees.
Cross-region reads (S3 → us-east-1) likewise incur cross region data transfer fees unless the bucket lives in
us-east-1where Netography ingest reads from.
Deployment patterns
Same as VPC
No
Yes, unless VPC is in us-east-1
Best for low-latency writes; one cross-region hop.
us-east-1 (Fusion ingest)
Yes, unless VPC is in us-east-1
No
One cross-region hop on write; zero on read.
Neither same as VPC nor us-east-1
Yes
Yes
Two cross-region hops—write and read—double the cross region data transfer fees.
1. IAM Policy and Custom Role for Netography to read flow logs from your S3
In order for Netography to automatically ingest flow logs from AWS, it needs to have permissions to fetch objects from S3.
Option 1: CloudFormation Stack
To make this process easier, we've provided the netography-base.yaml CloudFormation template, which can be found along with detailed instructions here: https://docs.netography.com/docs/netography-aws-cloudformation-automation#1-iam-policy-and-custom-role-for-netography. To deploy only the IAM roles, follow step 1 - only deploying the roles with no lambdas.
Option 2: Manual Deployment
If you would rather navigate through this process manually, see the documentation below.
AWS VPC via S3 Setup (AWS Console method)
AWS VPC via S3 Setup (CloudFormation method)
Additionally, if you would like to use a SNS topic or SQS queue, these steps from the AWS Quickstart Guide also provide some additional step by step instructions:
Note: These are optional. You can omit the optional SQS/SNS related permissions. Using SQS/SNS provides a trigger based mechanism for Netography to read new flow log files as they are written to S3. However, the latency difference may not be that critical to your use case, and so if you are building out your own automation you can simplify things by not using the optional SQS/SNS triggers. The examples shown below also omit these for simplicity.
3. Enabling VPC Flow Logs and Onboarding to Fusion
There are 2 steps for ingesting VPC flow logs to Fusion that must be completed for each VPC:
Enable VPC flow logs in AWS for the VPC, specifying a log destination in S3 to write the flow logs to. Refer to the next section for details on supported S3 log destinations.
Ensure Fusion has a traffic source configured to read the flow logs:
If each VPC flow log is configured with a unique S3 log destination (e.g. a folder or prefix is added to the S3 bucket path to differentiate it from other VPC flow logs in the same S3 bucket at the top-level directory), create a AWS VPC S3 traffic source in Fusion for each VPC.
If the same S3 log destination is used for all the VPC flow log configurations, create a AWS VPC S3 traffic source in Fusion for each unique account and region that is writing flow logs to that destinaton.
AWS Flow Log Configuration Requirements
To see the exact configuration required for a single AWS VPC flow log configuration, refer to one of these documentation links:
S3 Log Destination
Using a single S3 bucket for multiple VPC flow log configurations across multiple VPC and accounts IS SUPPORTED by Fusion. When using a single bucket, there is a design choice to make between specifying a unique top-level folder in the S3 bucket (also called a prefix, as it is added before AWS writes to AWSLogs), or using the exact same log destination for all VPCs.
S3 Log Destination Option 1: Differentiate flow logs by folder/prefix in the same S3 bucket
You can differentiate each flow log configuration by adding a unique folder name after the S3 bucket ARN in the flow log configuration. If you do so, then you add a Fusion traffic source to correspond to each VPC (technically each VPC flow log configuration), each in its own folder.
This approach has the benefit of allowing you to uniquely configure settings on a per VPC basis within both AWS and Fusion, and maintain a 1:1 mapping between VPCs with flow logs and Fusion traffic sources.
Example: Assume you have vpc-1 and vpc-2 and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket
In the AWS flow log configuration for vpc-1 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1
In the AWS flow log configuration for vpc-2 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-2
AWS will then write flow logs for vpc-1 to acme-myflowlogs-bucket/flowlogs-for-vpc-1/AWSLogs/12345789012/vpcflowlogs/us-east-1/ and for vpc-2 to acme-myflowlogs-bucket/flowlogs-for-vpc-2/AWSLogs/12345789012/vpcflowlogs/us-east-1/
You will configure 2 Fusion traffic sources for vpc-1 and vpc-2.
This differentiates the flow logs by path BEFORE the AWSLogs directory that AWS writes automatically. The directory structure under AWSLogs changes depending on the hive prefix and hourly partitioning settings.
You do not need to use/include the VPC ID or name for that folder name as is done in this example - it can be any string that is unique across all the VPC flow logs pointing to that bucket.
S3 Log Destination Option 2: Use the same S3 log destination across accounts and VPCs
You can use the same S3 log destination across accounts and VPCs. If you do so, then you add a Fusion traffic source to correspond to each unique account and region combination writing flow logs to this log destination.
This has the benefit of being able to maintain a consistent S3 log destination across an AWS Organization.
Example: Assume you have vpc-1 and vpc-2 and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket
In the AWS flow log configuration for vpc-1AND vpc-2 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket
AWS will then write flow logs for vpc-1 to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/ and for vpc-2 to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/ (assuming the VPC are in same account and region; if not the number and region will differ).
You will configure a Fusion traffic source for account 123456789012 and region us-east-1.
This differentiates the flow logs in different accounts and regions by the path AFTER the AWSLogs directory that AWS writes automatically. The directory structure under AWSLogs changes depending on the hive prefix and hourly partitioning settings.
❗️Limitations when using the same S3 log destination for multiple VPCs
Consistent VPC flow log configuration settings
If you use the same S3 log destination, then you must ensure that all VPC flow logs configured to write to that log destination use the same configuration setting for these fields (you can choose any setting you want, it just must be the same).
Log file format
Hive-compatible S3 prefix setting
Per Hour Partition Setting
Fusion configuration is per traffic source (per unique account/region), not per VPC
Fusion configuration settings like setting a sample rate and configuring tags operate at the traffic source level. When you use this approach, you will have 1 traffic source per unique account and region combination, so configuration operates at that level. If you need more granular control, such as setting a unique sample rate to an individual VPC, you need to use option 1 above (adding a prefix/folder and creating a traffic source per VPC) instead.
Scalability limitations for high volumes
If you will be delivering over 10,000 flow records per second to Netography across VPCs in a single account and region (pre-sampling), please reach out to Netography Support to discuss and agree on the right design for your environment, as there may be scalability reasons to distribute the ingest across traffic sources per VPC rather than per account/region.
3. AWS Context Integrations
In addition to ingesting VPC flow logs, Netography Fusion provides context enrichment for AWS through the use of Context Integrations.
Refer to AWS Context Integration for the additional permission requirements and options for context enrichment.
4. AWS Route 53 DNS Ingest
In addition to ingesting VPC flow logs, Netography Fusion also ingests Route 53 DNS resolver logs. Configuration and ingest for these logs follows a similar pattern to VPC flow logs. See AWS Route 53 DNS Logs via S3 Setup (Console) for more details on this support.
Automating Fusion Traffic Source Creation
The single VPC steps linked above show how to create a new traffic source for a VPC in the Fusion Portal. However, if you are ingesting numerous VPC, or flow log configuration occurs as part of an automation, you can also automate this step.
Automation for creating a single Fusion traffic source
Option 1: Use the Netography Fusion REST API
The API endpoints for creating, updating, and deleting a traffic source is documented here:
To create a new traffic source, you would construct a POST request with type aws.
In the Fusion API, the S3 ARN you specified earlier in the AWS flow log configuration is broken out into 2 separate fields, bucket, the ARN to the S3 bucket itself, and prefix, which is the folder name you added to end of the S3 bucket ARN to separate the directories for this flow log configuration from others.
In the previous example, we configured the S3 ARN for vpc-1 to be arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1. In the API call, set: "bucket": "acme-myflowlogs-bucket" and "prefix": "flowlogs-for-vpc-1"
Here is the curl command with example values for the fields:
Option 2: Use the neto CLI tool
neto CLI toolneto is a Python-based CLI tool that serves as a front-end to the REST API. It can also be useful if you are developing your own Python code to use as a reference as it has working code for interacting with these API endpoints encapsulated into a Python class. This tool is in development and is currently available from TestPyPi as a Python pip package.
neto documentation and links to install the Python package are available at:
https://test.pypi.org/project/neto/
Here are examples of how you can use neto from a shell:
Option 3: Use Python
3a. Minimal Python code to interact with API
This recipe provides basic Python code you can call to authenticate to the API and create a traffic source.
🦉
Create a Traffic Source in Python
Open Recipe
3b. Instantiate a Python class to interact with Fusion API
This recipe provides a subset of the Python class, NetoAPI, that you can use to interact with the API. This is effectively the same thing as 3a, but separates out the API code.
🦉
NetoAPI Python class to create traffic sources in Fusion
Open Recipe
Triggering Fusion Traffic Source Creation in AWS
The previous section covered how to create a single Fusion traffic source programatically. For building your own automation, the next step is to have whichever method you choose to use triggered when you create a new VPC flow log in AWS (and for the complete lifecycle, this should also cover when a flow log configuration is modified or deleted).
Option 1. Lambda-backed custom resource in CloudFormation Stack
If you are using CloudFormation to create VPCs and/or configure flow logs, you can use a Lambda-backed custom resource to call a Python function as part of the CloudFormation stack. This will then be applied to all newly created VPCs, and will work regardless of how the CloudFormation stack itself is deployed. If you are using AWS Service Catalog to deploy a CloudFormation stack to create new VPCs and configure flow logs already, you can add the final step of creating the Fusion traffic source for the VPC with this approach.
More information on this stack, and detailed instructions for installation can be found here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Follow the instructions to install the Flow feature. If you have already installed the base StackSet from the previous step, simply modify it when following the associated prerequisite steps.
⚠️The CloudFormation example is not meant to be used without modification
The example contains two CloudFormation templates that demonstrate how you can use CloudFormation to create a VPC, configure VPC flow logs, and onboard them to Netography Fusion. These templates are intended to provide an example for an engineer familiar with CloudFormation to integrate into their existing automation workflows.
If you are looking for a complete end-to-end solution that does not require CloudFormation expertise, consider using the Netography Cloud Onboarding Automation for AWS Organizations instead.
Option 2. Custom Lambda linked to EventBridge VPC Creation event
The example in the previous option includes a Lambda function that can be executed to create a new Fusion traffic source for a VPC. Instead of triggering that Lambda via a CloudFormation Lambda-backed custom resource, it can be triggered by an EventBridge VPC Creation event.
The 2 events to configure these triggers are:
Netography's Cloud Onboarding Automation for AWS Organizations uses this method, deploying a CloudFormation StackSet via Terraform. It serves as fully built out production quality example for how to configure AWS to trigger a Lambda for these events and Lambda code for performing the full lifecycle including deletions.
Option 3. Adapting a custom automation from Netography’s Cloud Onboarding Automation for AWS Organizations
If you are concerned about the overall complexity of Netography's full onboarding automation, even if you configure it to only execute a subset of its capabilities, it may still serve as a good working example of how to use CloudFormation, EventBridge, Lambda, and the Fusion API to build out your own automation.
Automating Fusion Context Creation
If you would like to automatically add context integrations for all accounts to see asset labels within Netography, you can follow the instructions for installing the Context feature here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Be sure to modify the base StackSet as described.
Last updated