Create Alarms for Known Issues from Metrics
Monitoring for Application Outcomes Ryn Brandish wants to understand when there are application issue and is concerned about security. He would like metrics for errors and warnings. ExampleCorp has limitations on the image size that it can successfully process. While this issue does not happen frequently it does not make customers happy. Being aware of the error rate for submitted images will allow the Business and Development team to determine if increase the image size should be a priority. Currently most warnings are related to security issues. Ryn would like visibility on how often they happen.
5.1 Create a log filter metric based on the defined threshold for the error
- Click on Log Groups in the breadcrumb trail at the top of the screen
- Select the application.log log group
- Click on the Create Metric Filter button
- Enter “ActiveStorage::InvariableError” (include quotes) for Filter Pattern. This is a known error that we will cause later in the lab.
- Click on the Assign Metric button
- Enter ImageError as the Metric Name
- Click on the Create Filter button
5.2 Create an alarm for the error when the metric threshold is crossed
- Click on Create Alarm
- Click Edit by Metric
- Change Period to 10 seconds
- Click on Select metric button
- Enter ImageErrorAlarm for Name
- After is >=, change the 0 to 1
- Select good (not breaching threshold) for Treat missing data as
- Under Actions Select from the Drop Down the ExampleCorpErrorNotify* list to Send notifications to
- Enter [your email address] for Email list
- Click on the Create Alarm button
- Click on the I will do it later button when asked to verify the email address
NOTE: You should see that the alarm has insufficient data with a 1 next to INSUFFICIENT in the navigation menu. This will change to OK within a few seconds.
5.3 Generate error logs through user activity
- Navigate to the ExampleCorp using the URL that you made a note of earlier (CloudFormation Output).
- Enter firstname.lastname@example.org for email
- Enter Password123 for Password
- Click Login
- Click Upload Image Click Select Image
- Find break_app.jpg in your filesystem
- Click Upload
NOTE: The application will start exhibiting issues, and the application will eventually fail to load.
5.4 Review logs to identify error
- Navigate to the CloudWatch Console
- Click Logs in the navigation menu
- Click on application.log log group - this is from /opt/ExampleCorp/log/application.log as configured earlier in the CloudWatch agent configuration.
- Click on the Search Log Group button
- Enter ERROR (case sensitive) and hit enter.
- You will see errors ending ActiveStorage::InvariableError. This is an error that is generated when a user uploads a non-image file. This is a known issue, but the Dev team has not had cycles to adress it.
Build a Monitoring Plan: Alerting and Response
Monitoring for Operational Outcomes Sansa Bailish is focused on outages, reliability, and getting better sleep. There is a known issue with image trends; when the instances are rebooted the application doesn’t restart. Too frequently Sansa has received a late night call to get online and restart the application. The user experience lead is very frustrated with the downtime associated to these incidents. They need to be detected sooner and resolved faster.
Sansa is looking for a monitoring solution to detect the incident (the reboot event), and a way to trigger an automated recovery.