Consider: What telemetry can we leverage to achieve the goals? Is there desirable telemetry we do not have? If so, how can we acquire it? Can we further instrument our application to emit the information we need? Can we configure out infrastructure to provide the information we need?
Our workload must emit the data that allows operations to understand workload health and the achievement of business outcomes. We have to collect that information, analyze it, and then act on it to maximize the benefits for our organizations. If we are collecting without analysis or action we are paying to store records that will at best only be used after an event in a forensic capacity. Our goal is to achieve insight.
Categories of insight:
We will iterate: engaging our stakeholders, determining requirements and priorities, implementing improvements to monitoring, evaluating their success, and repeating. Or put more simply: create a plan, implement the plan, check to see if the plan works, adjust the plan and repeat.
We are going to start by focusing on 3 goals
3.1 Install and configure the CloudWatch agent
We will collect metrics and logs from our Amazon EC2 instances using the CloudWatch agent. If we had on-prem servers we could also use the CloudWatch agent to capture metrics and logs from those.
The CloudWatch Agent requires and IAM role that was created with the execution of the CloudFormation script. We will install the agent, and then configure it using a predefined configuration placed in System Manager Parameter Store by our CloudFormation script. With our configuration in parameter store we can easily apply it as a standard across our fleet. You can learn more by following this Getting Started link.
3.2 Review the Parameter Store configuration
The configuration file we have in Parameter Store follows a required naming convention (it must start with AmazonCloudWatch-) that allows us to take advantage of the CloudWatchAgentServerPolicy IAM role. To simplify our management. You can learn more at this link..
3. Install and Configure the CloudWatch agent Systems Manager Run Command
Using System Manager’s Run Command we can install and configure the CloudWatch agent on an entire fleet as easily as we install it on a single instance. By using a command document we ensure consistent execution and limit the errors that could be introduced by a manual process.
3.3 Configure and (re)start the CloudWatch agent using Systems Manager Run Command
Want to learn more about collecting logs and metrics using the unified CloudWatch Agent? Follow this link.
(Optional Exercise) Review the Collected Application Logs [with the development team]
The more integrated the business, development, and operations teams are, the more successful they will be. The custom developed application emits a variety of logs. Engaging the developers to help gain insight to their contents, meaning, and their purpose will shorten the amount of time it takes to get value out of monitoring.
3.4 Generate logs through user activity
3.5 Publish VPC Flow Logs to CloudWatch Logs
When publishing VPC flow logs to CloudWatch Logs, flow log data is published to a log group, and each network interface has a unique log stream in the log group. Log streams contain flow log records. You can create multiple flow logs that publish data to the same log group. If the same network interface is present in one or more flow logs in the same log group, it has one combined log stream. If you’ve specified that one flow log should capture rejected traffic, and the other flow log should capture accepted traffic, then the combined log stream captures all traffic. For more information, see Flow Log Records.
Consider how you will use the collected VPC Flow Log data. On internal networks visibility of intended and unintended traffic can facilitate diagnosing issues. Capturing rejected traffic on your Internet Gateway by default may result in storing and processing a lot of data with limited value to you. Remember you can always enable the capability when needed and then disable it when no longer needed to help optimize the value your get from CloudWatch.
(Optional) Create a flow log for your VPC
Note: The following steps have been completed for you by the CloudFormation script.
What have we accomplished?
All of our instances are now logging to CloudWatch Logs and, our VPC is logging to CloudWatch Logs as well. Our instances and AWS services are publishing metrics to CloudWatch. Our application and workload are providing traces though integration with AWS X-Ray. We are collecting the available telemetry, and now it is time to analyze it and take advantage of it.