Netography AWS Onboarding Guide for Cloud Automation Engineers

Introduction

If you have not yet reviewed the options for how to onboard AWS VPC flow logs to Fusion, see: Automating AWS Cloud Onboarding. If you have done so and determined that you will be integrating the Netography Fusion AWS configuration requirements and VPC flow log onboarding process to your existing automation, this page provides the technical details and working examples needed for a cloud automation engineer to complete this integration.

This guide covers VPC flow logs configured to write to S3. Netography also supports using Kinesis to deliver flow logs to Netography. Kinesis provides a lower latency approach to delivering flow logs to Fusion, but it comes at a significantly higher cost from AWS. As a result, most Netography customers use the S3 method, and that is what is documented here.

To skip right to the CloudFormation deployment: https://docs.netography.com/docs/netography-aws-cloudformation-automation

AWS VPC Flow Log Configuration Steps (via S3)

1. Create a S3 bucket to Write VPC Flow Logs to

S3 Bucket Region Recommendations

We recommend creating a S3 bucket for each region you have VPCs in, and directing the flow logs for each VPC to a S3 bucket in the same region. This minimizes write latency and cost.

If you want to use a single S3 bucket instead for simplicity, we recommend creating the S3 bucket in us-east-1. This minimizes read latency and cost, at the expense of write latency.

The configuration to avoid is having a single S3 bucket in a region that is NOT us-east-1. This will double your cross-region data transfer cost. This is a supported configuration, just not optimized for cost.

ℹ️

Why regional alignment of S3 bucket(s) matter

Although you can write VPC Flow Logs to a bucket in any region, co-locating the bucket with either the VPC or Netography’s ingest endpoint (us-east-1) minimizes both latency and data-transfer fees:

  • Latency
    • Same-region writes (co-located with the VPC) or reads (us-east-1) avoid extra network hops.
  • Cost
    • Intra-region data transfers are free.
    • Cross-region writes (e.g. eu-west-1 → us-east-1) incur standard AWS cross region data transfer fees.
    • Cross-region reads (S3 → us-east-1) likewise incur cross region data transfer fees unless the bucket lives in us-east-1where Netography ingest reads from.

Deployment patterns

Bucket LocationCross-Region WriteCross-Region ReadNotes
Same as VPCNoYes, unless VPC is in us-east-1Best for low-latency writes; one cross-region hop.
us-east-1 (Fusion ingest)Yes, unless VPC is in us-east-1NoOne cross-region hop on write; zero on read.
Neither same as VPC nor us-east-1YesYesTwo cross-region hops—write and read—double the cross region data transfer fees.

1. IAM Policy and Custom Role for Netography to read flow logs from your S3

In order for Netography to automatically ingest flow logs from AWS, it needs to have permissions to fetch objects from S3.

Option 1: CloudFormation Stack

To make this process easier, we've provided the netography-base.yaml CloudFormation template, which can be found along with detailed instructions here: https://docs.netography.com/docs/netography-aws-cloudformation-automation#1-iam-policy-and-custom-role-for-netography. To deploy only the IAM roles, follow step 1 - only deploying the roles with no lambdas.

Option 2: Manual Deployment

If you would rather navigate through this process manually, see the documentation below.

AWS VPC via S3 Setup (AWS Console method)

AWS VPC via S3 Setup (CloudFormation method)

Additionally, if you would like to use a SNS topic or SQS queue, these steps from the AWS Quickstart Guide also provide some additional step by step instructions:

Create IAM policy

Create custom role

Note: These are optional. You can omit the optional SQS/SNS related permissions. Using SQS/SNS provides a trigger based mechanism for Netography to read new flow log files as they are written to S3. However, the latency difference may not be that critical to your use case, and so if you are building out your own automation you can simplify things by not using the optional SQS/SNS triggers. The examples shown below also omit these for simplicity.

3. Enabling VPC Flow Logs and Onboarding to Fusion

There are 2 steps for ingesting VPC flow logs to Fusion that must be completed for each VPC:

  1. Enable VPC flow logs in AWS for the VPC, specifying a log destination in S3 to write the flow logs to. Refer to the next section for details on supported S3 log destinations.
  2. Ensure Fusion has a traffic source configured to read the flow logs:
    1. If each VPC flow log is configured with a unique S3 log destination (e.g. a folder or prefix is added to the S3 bucket path to differentiate it from other VPC flow logs in the same S3 bucket at the top-level directory), create a AWS VPC S3 traffic source in Fusion for each VPC.
    2. If the same S3 log destination is used for all the VPC flow log configurations, create a AWS VPC S3 traffic source in Fusion for each unique account and region that is writing flow logs to that destinaton.

AWS Flow Log Configuration Requirements

To see the exact configuration required for a single AWS VPC flow log configuration, refer to one of these documentation links:

  1. Quickstart: AWS
  2. AWS VPC via S3 Setup (AWS Console method)
  3. AWS VPC via S3 Setup (CloudFormation method)

S3 Log Destination

Using a single S3 bucket for multiple VPC flow log configurations across multiple VPC and accounts IS SUPPORTED by Fusion. When using a single bucket, there is a design choice to make between specifying a unique top-level folder in the S3 bucket (also called a prefix, as it is added before AWS writes to AWSLogs), or using the exact same log destination for all VPCs.

ℹ️

Explaining the relationship between Fusion traffic sources and AWS VPC flow log configurations

The Fusion traffic source configuration contains the S3 bucket, folder/prefix name, account ID, and region. It then constructs the exact S3 path where flow logs are being written for that traffic source.

When using the same log destination across all flow log configurations, that path will contain all the flow logs for the account and region in the traffic source configuration. Therefore, you must create 1 traffic source per account and region that is writing flow logs to the S3 bucket.

When using a unique log destination for each flow log configuration (whether that's a unique S3 bucket or a single S3 bucket with a unique folder/prefix), that path will contain the flow logs for ONLY that one flow log configuration. Therefore, you must create 1 traffic source per flow log configuration.

S3 Log Destination Option 1: Differentiate flow logs by folder/prefix in the same S3 bucket

You can differentiate each flow log configuration by adding a unique folder name after the S3 bucket ARN in the flow log configuration. If you do so, then you add a Fusion traffic source to correspond to each VPC (technically each VPC flow log configuration), each in its own folder.

This approach has the benefit of allowing you to uniquely configure settings on a per VPC basis within both AWS and Fusion, and maintain a 1:1 mapping between VPCs with flow logs and Fusion traffic sources.

Example: Assume you have vpc-1 and vpc-2 and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket

In the AWS flow log configuration for vpc-1 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1

In the AWS flow log configuration for vpc-2 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-2

AWS will then write flow logs for vpc-1 to acme-myflowlogs-bucket/flowlogs-for-vpc-1/AWSLogs/12345789012/vpcflowlogs/us-east-1/ and for vpc-2 to acme-myflowlogs-bucket/flowlogs-for-vpc-2/AWSLogs/12345789012/vpcflowlogs/us-east-1/

You will configure 2 Fusion traffic sources for vpc-1 and vpc-2.

This differentiates the flow logs by path BEFORE the AWSLogs directory that AWS writes automatically. The directory structure under AWSLogs changes depending on the hive prefix and hourly partitioning settings.

You do not need to use/include the VPC ID or name for that folder name as is done in this example - it can be any string that is unique across all the VPC flow logs pointing to that bucket.

S3 Log Destination Option 2: Use the same S3 log destination across accounts and VPCs

You can use the same S3 log destination across accounts and VPCs. If you do so, then you add a Fusion traffic source to correspond to each unique account and region combination writing flow logs to this log destination.

This has the benefit of being able to maintain a consistent S3 log destination across an AWS Organization.

Example: Assume you have vpc-1 and vpc-2 and are configuring flow logs for both VPC to write to S3 bucket arn:aws:s3:::acme-myflowlogs-bucket

In the AWS flow log configuration for vpc-1AND vpc-2 set the S3 ARN to: arn:aws:s3:::acme-myflowlogs-bucket

AWS will then write flow logs for vpc-1 to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/ and for vpc-2 to acme-myflowlogs-bucket/AWSLogs/12345789012/vpcflowlogs/us-east-1/ (assuming the VPC are in same account and region; if not the number and region will differ).

You will configure a Fusion traffic source for account 123456789012 and region us-east-1.

This differentiates the flow logs in different accounts and regions by the path AFTER the AWSLogs directory that AWS writes automatically. The directory structure under AWSLogs changes depending on the hive prefix and hourly partitioning settings.

❗️

Limitations when using the same S3 log destination for multiple VPCs

Consistent VPC flow log configuration settings

If you use the same S3 log destination, then you must ensure that all VPC flow logs configured to write to that log destination use the same configuration setting for these fields (you can choose any setting you want, it just must be the same).

  1. Log file format
  2. Hive-compatible S3 prefix setting
  3. Per Hour Partition Setting

Fusion configuration is per traffic source (per unique account/region), not per VPC

Fusion configuration settings like setting a sample rate and configuring tags operate at the traffic source level. When you use this approach, you will have 1 traffic source per unique account and region combination, so configuration operates at that level. If you need more granular control, such as setting a unique sample rate to an individual VPC, you need to use option 1 above (adding a prefix/folder and creating a traffic source per VPC) instead.

Scalability limitations for high volumes

If you will be delivering over 10,000 flow records per second to Netography across VPCs in a single account and region (pre-sampling), please reach out to Netography Support to discuss and agree on the right design for your environment, as there may be scalability reasons to distribute the ingest across traffic sources per VPC rather than per account/region.

3. AWS Context Integrations

In addition to ingesting VPC flow logs, Netography Fusion provides context enrichment for AWS through the use of Context Integrations.

Refer to AWS Context Integration for the additional permission requirements and options for context enrichment.

4. AWS Route 53 DNS Ingest

In addition to ingesting VPC flow logs, Netography Fusion also ingests Route 53 DNS resolver logs. Configuration and ingest for these logs follows a similar pattern to VPC flow logs. See AWS Route 53 DNS Logs via S3 Setup (Console) for more details on this support.

Automating Fusion Traffic Source Creation

The single VPC steps linked above show how to create a new traffic source for a VPC in the Fusion Portal. However, if you are ingesting numerous VPC, or flow log configuration occurs as part of an automation, you can also automate this step.

📘

Traffic Source and Flow Source mean the same thing in Fusion

When Fusion only ingested flow logs, the term flow source was used, but since DNS was added to the product, not all sources are for flow, so the more generic term traffic source is now used. Some code and documentation may still refer to a flow source. They are the same thing and use the same API endpoints.

Automation for creating a single Fusion traffic source

Option 1: Use the Netography Fusion REST API

The API endpoints for creating, updating, and deleting a traffic source is documented here:

Create VPC

Update VPC

Delete VPC

To create a new traffic source, you would construct a POST request with type aws.

In the Fusion API, the S3 ARN you specified earlier in the AWS flow log configuration is broken out into 2 separate fields, bucket, the ARN to the S3 bucket itself, and prefix, which is the folder name you added to end of the S3 bucket ARN to separate the directories for this flow log configuration from others.

In the previous example, we configured the S3 ARN for vpc-1 to be arn:aws:s3:::acme-myflowlogs-bucket/flowlogs-for-vpc-1. In the API call, set: "bucket": "acme-myflowlogs-bucket" and "prefix": "flowlogs-for-vpc-1"

🔑

Authenticating to the API

Before you call the vpc API endpoint to create a traffic source, you must authenticate to the API, which will return a bearer token that you include in the authorization header in subsequent calls. For details, see:Authentication via API Key

You can use the shell script provided in this recipe to perform the authentication and get the bearer token to use:

curl --request POST \ --url https://api.netography.com/api/v1/vpc \ --header 'accept: application/json' \ --header 'content-type: application/json' \ --header 'authorization: Bearer INSERT_JWT_BEARAR_TOKEN' \ --data ' { "flowtype": "aws", "flowresource": "s3", "enabled": true, "awsauthtype": "RoleARN", "role": { "arn": "ROLE_ARN" }, "name": "FLOW_SOURCE_NAME", "traffictype": "flow", "bucket": "S3_BUCKET_NAME", "bucketregion": "S3_BUCKET_REGION", "prefix": "FOLDER_NAME_IF_APPLICABLE", "region": "VPC_REGION", "accountid": "VPC_ACCOUNT_ID", "tags": [ "VPC_ID" ] }

Here is the curl command with example values for the fields:

curl --location 'https://api.netography.com/api/v1/vpc' \ --header 'accept: application/json' \ --header 'authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlE1V1Z6Uk1CM1JxR0sxOEJDU2k3Yk1Gb0pNcUlHZTZCQlU0In0.eyJleHAiOjE3NTMwMjUyMDAsImlhdCI6MTc1MjkxODgwMCwianRpIjoiZTRlZTVjMzMtOTUzNS00YTk0LWE1ZDYtMmMwYjczNmI1MzFkIiwiaXNzIjoiaHR0cHM6Ly9hdXRoLmZha2UtcHJvZy5jb20vYXV0aC9yZWFsbXMiLCJhdWQiOiJhcGktY2xpZW50Iiwic3ViIjoiYWRtaW4tNzk4N2U2NzAtNzM1Ni00ODFhLTk5NzUtMTYyOTg5OGVkODJhIiwidHlwIjoiQmVhcmVy' \ --header 'content-type: application/json' \ --data ' { "flowtype": "aws", "flowresource": "s3", "enabled": true, "awsauthtype": "RoleARN", "role": { "arn": "arn:aws:iam::1234567890123:role/NetoFlowLogReader" }, "name": "vpc-01", "samplerate": 1, "bucket": "neto-myflowlogs-bucket-example", "bucketregion": "us-east-1", "prefix": "myflowlogs-for-vpc-01", "region": "us-east-1", "accountid": "1234567890123", "tags": [ "vpc-01" ] }

Option 2: Use the neto CLI tool

neto is a Python-based CLI tool that serves as a front-end to the REST API. It can also be useful if you are developing your own Python code to use as a reference as it has working code for interacting with these API endpoints encapsulated into a Python class. This tool is in development and is currently available from TestPyPi as a Python pip package.

neto documentation and links to install the Python package are available at:

https://test.pypi.org/project/neto/

Here are examples of how you can use neto from a shell:

> neto aws traffic create -h usage: neto aws traffic create [-h] [--prefix PREFIX] [--name-prefix NAME_PREFIX] --vpcid VPCID --region REGION --accountid ACCOUNTID --rolearn ROLEARN --logbucket LOGBUCKET options: -h, --help show this help message and exit --prefix PREFIX Folder prefix for the S3 logs. This is the path in the bucket where logs will be stored. --name-prefix NAME_PREFIX Prefix for the traffic source name --vpcid VPCID VPC ID --region REGION AWS region --accountid ACCOUNTID AWS account ID --rolearn ROLEARN AWS role ARN --logbucket LOGBUCKET Log bucket > neto aws traffic create --prefix /vpc-1234/ --name-prefix neto --vpcid vpc-1234 --region us-east-1 --accountid 123456789 --rolearn arn:aws:iam::123456789:role/NetoFlowLogReader --logbucket neto-one-for-all INFO: neto v1.1.10 - Netography Fusion CLI tool INFO: Using profile DEFAULT INFO: Authenticating to Netography Fusion API for qa-customer at https://api.netography.com/api/v1 INFO: Successfully authenticated to Netography Fusion API for qa-customer at https://api.netography.com/api/v1 INFO: ++++ Successfully added flow source neto-123456789-vpc-1234 to Netography Fusion > neto aws traffic delete -h usage: neto aws traffic delete [-h] [--name-prefix NAME_PREFIX] --vpcid VPCID --accountid ACCOUNTID --region REGION options: -h, --help show this help message and exit --name-prefix NAME_PREFIX Traffic source name prefix --vpcid VPCID VPC ID --accountid ACCOUNTID AWS account ID --region REGION AWS region > neto aws traffic delete --name-prefix neto --vpcid vpc-1234 --accountid 123456789 --region us-east-1 INFO: neto v1.1.10 - Netography Fusion CLI tool INFO: Using profile DEFAULT INFO: Authenticating to Netography Fusion API for qa-customer at https://api.netography.com/api/v1 INFO: Successfully authenticated to Netography Fusion API for qa-customer at https://api.netography.com/api/v1 INFO: Retrieved 2 Netography Fusion traffic sources for account qa-customer INFO: Deleted Netography Fusion flow source 924227113 INFO: Deleted flow source for VPC vpc-1234

Option 3: Use Python

3a. Minimal Python code to interact with API

This recipe provides basic Python code you can call to authenticate to the API and create a traffic source.

3b. Instantiate a Python class to interact with Fusion API

This recipe provides a subset of the Python class, NetoAPI, that you can use to interact with the API. This is effectively the same thing as 3a, but separates out the API code.

Triggering Fusion Traffic Source Creation in AWS

The previous section covered how to create a single Fusion traffic source programatically. For building your own automation, the next step is to have whichever method you choose to use triggered when you create a new VPC flow log in AWS (and for the complete lifecycle, this should also cover when a flow log configuration is modified or deleted).

ℹ️

This example differentiates flow logs by prefix (S3 log destination option 1)

If your preferred design is to use a consistent S3 log destination (S3 log destination option 2 above), you will need to adjust the examples to omit the prefix and create a traffic source per account/region instead of per VPC.

Option 1. Lambda-backed custom resource in CloudFormation Stack

If you are using CloudFormation to create VPCs and/or configure flow logs, you can use a Lambda-backed custom resource to call a Python function as part of the CloudFormation stack. This will then be applied to all newly created VPCs, and will work regardless of how the CloudFormation stack itself is deployed. If you are using AWS Service Catalog to deploy a CloudFormation stack to create new VPCs and configure flow logs already, you can add the final step of creating the Fusion traffic source for the VPC with this approach.

More information on this stack, and detailed instructions for installation can be found here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Follow the instructions to install the Flow feature. If you have already installed the base StackSet from the previous step, simply modify it when following the associated prerequisite steps.

⚠️

The CloudFormation example is not meant to be used without modification

The example contains two CloudFormation templates that demonstrate how you can use CloudFormation to create a VPC, configure VPC flow logs, and onboard them to Netography Fusion. These templates are intended to provide an example for an engineer familiar with CloudFormation to integrate into their existing automation workflows.

If you are looking for a complete end-to-end solution that does not require CloudFormation expertise, consider using the Netography Cloud Onboarding Automation for AWS Organizations instead.

Option 2. Custom Lambda linked to EventBridge VPC Creation event

The example in the previous option includes a Lambda function that can be executed to create a new Fusion traffic source for a VPC. Instead of triggering that Lambda via a CloudFormation Lambda-backed custom resource, it can be triggered by an EventBridge VPC Creation event.

The 2 events to configure these triggers are:

vpc_create_event_pattern = {  
    "source": ["aws.ec2"],  
    "detail-type": ["AWS API Call via CloudTrail"],  
    "detail": {  
        "eventSource": ["ec2.amazonaws.com"],  
        "eventName": ["CreateVpc"],  
    },  
}

delete_vpc_event_pattern = {  
    "source": ["aws.ec2"],  
    "detail-type": ["AWS API Call via CloudTrail"],  
    "detail": {  
        "eventSource": ["ec2.amazonaws.com"],  
        "eventName": ["DeleteVpc"],  
    },  
}

Netography's Cloud Onboarding Automation for AWS Organizations uses this method, deploying a CloudFormation StackSet via Terraform. It serves as fully built out production quality example for how to configure AWS to trigger a Lambda for these events and Lambda code for performing the full lifecycle including deletions.

Option 3. Adapting a custom automation from Netography’s Cloud Onboarding Automation for AWS Organizations

If you are concerned about the overall complexity of Netography's full onboarding automation, even if you configure it to only execute a subset of its capabilities, it may still serve as a good working example of how to use CloudFormation, EventBridge, Lambda, and the Fusion API to build out your own automation.

Automating Fusion Context Creation

If you would like to automatically add context integrations for all accounts to see asset labels within Netography, you can follow the instructions for installing the Context feature here: https://docs.netography.com/docs/netography-aws-cloudformation-automation. Be sure to modify the base StackSet as described.


Did this page help you?