Amihan’s Brown Bag Sessions: Monitoring and Alerts through Project Curiosity
The Brown Bag Sessions were specifically created for Amihan team members to learn, share, and discover new knowledge and techniques in software development, cloud infrastructure, quality engineering and more. These are informal training workshops that happen regularly, and every team member, regardless of department or role, is invited to watch and learn.
For the most recent Brown Bag Session, Associate Cloud DevOps Engineer Nicklaus Roy shared his team’s efforts to create a new monitoring and alerting system, Project Curiosity.
In between various everyday tasks are these down times where you are either waiting for someone else’s feedback or you no longer have pending tasks because you finished earlier than expected. So what do you do? Why not learn, read, watch, or do some work on that idea you’ve had for so long that can help you in your tasks? As such, the idea I had was to perform research and design on our monitoring and alerting system. From this, Project Curiosity was born.
It’s not a new technology or anything groundbreaking. The systems and tools existed way before I even got hired but it wasn’t utilized in the most efficient way. So I set out to learn about the industry’s best practices to further improve the existing systems.
Monitoring is like having another person constantly watching your resources, yet that person has multiple eyes viewing different environments, and looking at different things at once. For a person involved in multiple projects, an extra, automated help is much appreciated. But over-monitoring is not the solution. The tool can only retrieve information about what’s happening in your machines or components – a person is still required to make sense of what is going on. When a problem occurs, an engineer needs to look at the data and figure out the root cause. And if that engineer needs to sift through and make sense of dozens of dashboards to get relevant information, then that only adds unnecessary cognitive load on the engineer. An efficient monitoring system needs to reduce cognitive load.
A good rule of thumb is that every dashboard added should have a distinct and clear objective. These objectives are usually tied down to a problem that has already happened. A dashboard to monitor the expiration of certificates exists because a problem of certificates expiring that lead to sites going down happened before. Dashboards to monitor the health of a node exist because of problems experienced on a node crashing. A dashboard must tell a story of what is currently happening with your resources. If it’s not maximized or if it’s rarely used, then you can either replace or remove that dashboard.
Besides helping an engineer not get overwhelmed by the number of confusing dashboards, you can also take off the load of detecting problems. By setting up a robust alerting system using the information coming from your dashboards, you can notify an engineer about a current problem or possible future problem in an automated way. If a service inside the environment is returning too many error messages, then you can send an email or Slack notification to the engineer in charge. Just be careful not to overburden them with too many alerts.
With all this information I had researched and tested, all that’s left was to share it. The brown bag was conducted with the hopes of shedding some light to what the future of monitoring and alerting is inside the company. And with some more experience and exploration of new tech, this might be the start of something much greater.
As a digital transformation partner, Amihan helps businesses seize opportunities, transform customer experiences, and build capabilities to stay ahead of the competition using Cloud Infrastructure, Data Engineering, Analytics, and Digital Process Outsourcing. Connect with us via our website.
Nicklaus Roy is a member of the Cloud Engineering Team and a former educator. Besides his dedication to continuous learning and experimentation, such as the DevOps way, he also loves sharing his ideas and knowledge to his fellow colleagues.
No responses yet