ICAM - Monitoring Bookinfo
IBM Cloud App Management (ICAM) can provide great visibiliity into Bookinfo application and how it is performing. This document will guide you through the steps to setup monitoring for the application. ICAM has many other capabilities, but we’re going to focus these activities around the Bookinfo application. Additional details can be found in the “ICAM - Post Install Setup” section or in the product documentation. The recommendation is to perform the following key activities in the order listed in this document:
- Setup kubernetes monitoring
- Setup the Unified Agent and the OpenShift Plugin + optionally setup the NGINX plugin
- Setup synthetic monitoring to proactively monitor the Bookinfo application and microservices
- Install the Runtime Data Collectors to instrument the Bookinfo Microservices
- Perform additional tasks such as Policies, Runbooks, Resource Groups, etc. to fine tune your experience
Our recommendation is to start by instrumenting your Kubernetes clusters with the Kubernetes Data Collector. The Kubernetest Data Collector will gather detailed information about the performance of the components of the kubernetes cluster and will generate alerts on abnormal behavior. You’ll see performance and topology information about the cluster, namespaces, nodes, pods, containers, and more. In addition, the Kubernetes Data Collector will generate Events and Incidents whenever kubernetes detects problems.
For MCM environments, use the klusterlet to install the Kubernetes Data Collector. Instructions for monitoring a cluster containing the ICP core services can be found here: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/install_mcm_klusterlet.html. If the monitored cluster does not have the core services available, use these instructions: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/install_mcm_klusterlet_no_helm.html.
After instrumenting the OpenShift cluster with the kubernetes data collector, you will see a large amount of monitor data related to the cluster. But, in an OpenShift environment there are some additional metrics that you will want to monitor. For example, constructs like “projects” are unique to OpenShift and are not monitored by the Kubernetes Data Collector. In the environment, install the Unified Agent and configure the OpenShift plugin. Instructions on setting up the Unified Agent can be found here: https://www.ibm.com/support/knowledgecenter/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/unifiedagent_intro.html
Optionally, when you setup the OpenShift plugin, you may also want to configure the NGINX plugin. This plugin allows you to monitor the health NGINX and monitor the performance of the traffic through the NGINX http server.
One of the most effective and easiest ways to monitor an application is by using the Synthetic monitoring capability. With ICAM, you can deploy Synthetic PoP Agents that will automatically play back synthetic Web and REST API tests to validate that the application and microsevices are healthy.
One consideration when deploying the Synthetic PoP Agents is the location of web pages and APIs that are being monitored. In the case of the bookinfo application the application website is available on the external network. To monitor the application web page, you can setup a PoP on an external network segment and setup a PoP within the kubernetes network. This can be valuable because it helps you understand whether you have backend server/application problems or network problems. If the synthetic tests within the kubernetes network are responding normally, but the tests from the external network are slow or failing, then you know that you have a networking problem.
For the synthetic testing of the REST APIs, they are typically only accessible via the kubernetes network. So, you will only be able to run the synthetic tests from a PoP running within the kubernetes environment. If you want high availability of the PoP Agents, you can install multiples, but they all need to run within the kubernetes network.
Once you setup your synthetic PoP Agents are setup, you are ready to setup the synthetic tests.
Website testing is executed via Selenium. For most applications, you typically want to exercise the entire application functionality to make sure the application is working. This type of test can be accomplished by installing the Selenium IDE plugin into your browser and then recording the key clicks as you exercise your application. The bookinfo application is very simple. If you want, you can simply setup a website test of single URL with the following format
http://bookinfo.<Ingress IP address>.nip.io/productpage?u=normal
Or, you can ceate a simple Selenium scripted test where you go to the main web page
http://bookinfo.<Ingress IP address>.nip.io/
and then navigate from the main page to the “Normal user” or “Test user” link.
Replace <Ingress IP Address> with the IP address of your cluster ingress.
Once your synthetic test is configured, test it from 2 or more locations so that you can compare the performance.
In addition to the website synthetic test, you want to proactively test each of the REST APIs. The microservice REST APIs are not expose externally, so you will only be able to test them from a PoP running inside the kubernetes network. Setup synthetic GET requests against the following URLs:
You can determine the IP address and confirm the port for each of the microservices by typing “kubectl get services”. You will see a report similar to the one below:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
details ClusterIP 10.0.48.208 <none> 9080/TCP 45d
mysqldb ClusterIP 10.0.36.138 <none> 3306/TCP 45d
productpage ClusterIP 10.0.255.0 <none> 9080/TCP 45d
ratings ClusterIP 10.0.151.72 <none> 9080/TCP 45d
reviews ClusterIP 10.0.29.64 <none> 9080/TCP 45d
When you define the synthetic tests, you can also validate the json payload. For example, when you call the details microservice, you will see that the “author” is “William Shakespeare”. You can have the synthetic test confirm that “William Shakespeare” is being returned.
After you have setup your test, you can view the results of the sythetic test by going to the “Synthetic results” tab.
The next recommended step is to instrument the Runtimes that make up the Bookinfo application. This application consists of python, node.js, Go, and Liberty runtimes. Each can be instrumented with the Runtime Data Collectors. More information on the Runtime Data Collectors can be found here: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/dc_runtime_intro.html
The Runtimes are normally instrumented in your DevOps pipeline or early in the development phases. The Data Collectors gather detailed information about the Application Server performance and tracks the transactions as they flow between the microservices.
- Instrument the productpage service with the python Data Collector.
- Instrument the details service with the Go Data Collector
- Instrument the reviews service with the Liberty Data Collector
- Instrument the ratings service with the node.js Data Collector
When instrumenting the Runtimes, the Data Collectors have a configuration parameter that allows you to determine the percentage of transactions that are sampled. Depending on the Runtime, the parameter will either be OPENTRACING_SAMPLER or JAEGER_SAMPLER_PARAM. If this value is set to “1”, 100% of the transactions are monitored. If the parameter is set to “0.1”, 10% of the transactions are sampled. By default, only 1% of the transactions are sampled in order to keep the overhead of the Data Collector small. In a production environment with high transaction rates, 1% is appropriate. In a test environment, you might want to enable 100% sampling of the transactions to make it easier to see data in the UI. For applications with low to moderate transaction rates, 5% or 10% sampling might be a good sampling rate.
Once you have instrumented the Runtimes, you will start to see a number of useful pieces of information by selecting “Resources” and then drilling down into any of the “kubernetes service” resources that make up the Bookinfo application. The kubernetes service names are: productpage, details, ratings, and reviews. There is also a mysqldb service, but that is not instrumented with Runtime monitoring.
In the page for each microservice, you will see the Golden Signals, service topology, and service deployment. You can use this information to diagnose performance bottlenecks within the Bookinfo application.
At this point you have a fully functional monitoring environment that is monitoring the Bookinfo application. There are default thresholds that are monitoring your application. There are some optional steps listed below that you may want to do in order to improve your experience with the product.
Users and Groups:
If you want, under the Admnistration tab, you can. click on the “Users and Groups” tab and add users and groups into the system. Note, in order to add users to ICAM, you must first add them into MCM. When adding Users and Groups, you assign roles to those users depending on what permissions you want them to have. The highlight level permissions is an “Operations Lead” which give you access to the Administratiive tools. An “Operator” has the fewest permissions in the tool and can primarly do read-only operations. For more information in configuring Users and Groups: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/admin_manage_users.html
One of the options when setting up a user account is to setup how the user wants to be notified. For example, a user might want to be notified via E-mail during normal working hours from 8:00 to 5:00. Other notification options like SMS and Voicemail are possible, but require additional configuration and rely on 3rd party services.
If you want to organize your monitored resources into Resource Groups so that you can group related resources and perform administrative tasks against those groups, click on the “Resource Groups” tile within the Administrative tab. For the Bookinfo application, it makes sense to organize the various resources that make up the application. Once you have a Resource Group defined, you will be able to view the group and assign thresholds to the group. More information on Resource Groups can be found here: https://www.ibm.com/support/knowledgecenter/en/SSHLNR_8.1.4/com.ibm.pm.doc/install/admin_thresholds_intro.html
In the case of the Bookinfo application, consider adding the “Kubernetes Services” along with the various Runtime Data Collectors into the Resource Group.
The primary Integration that you would want to setup for the Bookinfo application would be outbound integration with an Event Management server or a Service Desk. For example, if you have Netcool Operations Insights in the environment, you will want to forward the Incidents to Netcool. If you have a Service Desk such as Service Now, you will want to forward the Incidents from ICAM or possible configure the integration within Netcool: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/integ_intro.html
The Bookinfo application is not defined to auto-scale. Therefore, you might want to create a runbook that allows you to scale out the deployment. For example, you detect that the Ratings service is having performance problems, scale out the bookinfo-ratings deployment. If you want your Runbook integrated with the Incidents, then you must pass parameters in from the Events. That will be a bit tricky because the most likely source of the Events will be the pod. Let’s say the pod name is bookinfo-ratings-75459645d4-nwhv6, you will need to extract the “bookinfo-ratings” portion of the name in your script and then run a command similar to “kubectl scale deploy bookinfo-ratings —replicas=3”. The parameters can be passed into the Runbook from the Event and the replicas can be defined as a configurable parameter.
Find more information on runbooks here: https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/em_rba.html
Policies are way of customizing the way that Events and Incidents are processed. As new Events and Incidents are detected, you may want to customize the event details, send out a notification, suppress specific events and more. All of this can be achieved by selecting the “Policies” tile within the Administration tab. Find information on Event Policies here https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/em_setupeventpolicies.html and on Incident Policies here https://www.ibm.com/support/knowledgecenter/en/SS8G7U_19.4.0/com.ibm.app.mgmt.doc/content/em_overview.html.
In the case of the Bookinfo application, you may want to create an Event Policy that cause the Events from the Bookinfo application to be correlated into a single incident. By default, Incidents are correlated based on the hostname and IP address, but these microservices are running on different hosts with different IPs. Create a policy similar to the one shown below. You’ll notice that it will affect any Event where the resource name contains “bookinfo” AND the resource type contains “k8s” which means it is a kubernetes resource. Any Event that matches the criteria will have an “Application” Event slot populated with “Bookinfo”. This will cause the events to be correlated into a single Incident.
It is also possible to assign a Runbook via the Event Policy. In order to assign a Runbook, you must first create the Runbook following the guidance below.