COURSE

Datadog - Web application's Monitoring

DIFFICULTY

Normal

APPROXIMATE TIME

1h30

RELATED MACHINE

You have no instance yet for this lesson.
Launching new machine may take time.

Datadog

VII - Web application monitoring

a - Introduction

Application monitoring is one of the most important elements of any system design. It helps us examine and analyze our servers for operations-related tasks such as availability, metrics, containers, running processes, performance, networks, logs, traces, security and so on. Since the health of the application depends to a large extent on the health of the underlying server, it's essential to use good server monitoring tools, which give us the capabilities mentioned.

The DataDog Application Performance Monitoring (APM) agent is a technology that provides end-to-end distributed monitoring from browser and mobile applications to databases and individual lines of code. By seamlessly correlating distributed traces with frontend and backend data, this technology monitors service dependencies and health metrics, reduces latency and eliminates errors.

It's very common these days for organizations to want to create native cloud solutions that are easy to deploy on any cloud platform. By evolving towards a microservices architecture, the biggest change is that the monolithic application has been split into several independent processes (or services), as a result, a method call between software modules now becomes a remote procedure call passing through several services.

The MicroServices are the most important tools in this process.

Microservices are an architectural style that structures an application as a collection of small, autonomous services, modeled around a business domain. For each component, different services are created to separate responsibility.

b - What is distributed tracing?

Distributed tracing, also known as distributed application tracing, is a powerful tool for IT and DevOps teams. It is primarily used to monitor microservices-based applications.

Let's take a closer look at what it involves.

Understanding tracing

This method goes beyond simple monitoring. It helps locate faults and identify the causes of poor performance. It requires developers to add special instrumentation to the application code to provide the data needed for performance analysis and debugging.

This method goes beyond simple monitoring.

Query tracking and metrics

Query tracking is a fundamental practice in software engineering. Using portions of instrumentation code, developers can track and obtain metrics on various essential behaviors within the application code. It's a lot like application performance management (APM), but with a focus on tracking the individual requests that pass through the system, creating a "picture" (mapping) of the software's behavior.

Request tracking faces serious challenges when used in distributed software architectures with multiple functional modules or services. Services evolve independently, resulting in numerous iterations of the same function, potentially running on different host systems or even different environments.

Tracking in distributed environments

Distributed tracing makes sense in environments where services operate independently. In these contexts, tracking a single request can prove complex due to the many iterations of the same function spread across different systems or environments.

However, distributed tracing overcomes these challenges by tracking every request across every module or service. This gives application architects a detailed view of the performance of each function, facilitating the rapid identification and resolution of problems, such as a high latency introduced by a specific instance of a function.

c - Traces and extents

Distributed tracing relies on two key concepts: traces and extents.

A trace symbolizes the integral path taken by a request through all the services or components of a distributed system. Each trace is identified by a unique trace ID, used by tracing tools to organize, filter and retrieve specific traces.

Each trace is broken down into several extents, also known as segments. These extents represent the different operations or activities carried out within individual system services or components. Each of them has a unique extent ID and can carry specific metadata or annotations. The set of extents provides a complete view of the query path, enabling the source of a potential problem to be located.

To track a request through each extent, developers and operations staff can correlate each extent to a specific service instance and even identify the physical host system on which each extent is executed.

Trace data is not processed in real time. It is generated and stored locally before being transferred to a central location for analysis, a process similar to modern event logging and other metrics-gathering activities.

The trace data is not processed in real time.

It's worth noting that distributed tracing aims to represent the complete journey of a query, thus providing an end-to-end assessment of performance, from the point of entry at the front-end to the delivery of final results. Unlike some approaches, it generally does not apply localized tracking, focusing instead on global and integral tracking.

d - Advantages and limitations of distributed tracking

This type of tracking is often paralleled by APM (Application Performance Management) technologies. APM tools monitor and manage the performance and availability of software applications, providing data on performance problems and issuing alerts if established minimum service levels are not met.

They offer metrics illustrating the end-user experience, for example average response time during a peak of activity. These tools also highlight the resources used by the application, helping to identify potential bottlenecks. However, they are not directly connected to the application's source code.

Distributed tracing is also compared to event logging. Although both processes require a certain level of instrumentation to monitor and report key software activities, they differ in several respects. Logging records high-level information about events, such as when a predefined threshold is exceeded for a given resource.

It is designed to prevent the occurrence of events that could have a significant impact on the software.

It aims to avoid redundancy and often follows a standardized format to facilitate data aggregation and analysis. In contrast, request tracking can harvest a large amount of low-level information in a variety of formats.

Distributed tracing is particularly suited to debugging and monitoring contemporary distributed software architectures, including microservices. It proves indispensable when requests need to transit across multiple services and systems, or even different sites.

Distributed tracing is the ideal tool for debugging and monitoring contemporary software architectures, including microservices.

It enables precise and detailed identification of problems, facilitating their resolution. Nevertheless, it requires additional instrumentation integrated into the code base in production. In addition, tools dedicated to tracing data search and visualization can be complex to configure and use optimally.

e - Setting up distributed tracing with python

To put distributed tracing into practice with Datadog , we need to retrieve some existing code that will serve as our application. We'll be using Git because the application code can be found on Datascientest's git repository.

git clone https://github.com/datascientest/apm-datadog.git
cd apm-datadog

The repository contains a multiservice Python application preconfigured to run in Docker containers. The sample application is a basic notes application with a REST API for adding and modifying data.

Here we'll use Docker to containerize our application and Docker-compose for its deployment. We will, however, use a connection between our deployment with Docker-compose and our agent installed directly on the Datascientest physical machine so that it (our application) can send our traces to our Datadog instance via this Datadog linux agent.

Let's have a look at the files retrieved from the apm-datadog repository from the Datascientest Github account.

docker-compose.yml: This allows us to deploy our application.

cat docker-compose.yml

Output display:

# Fall Lewis YOMBA - fall-lewis.y@datascientest.com
#
# This product includes software developed at Datascientest (https://www.datascientest.com/)
# Copyright 2023 Fall lewis.
version: "3"
services:
  db: # the datasbase service
    container_name: test_postgres
    image: "postgres"
    environment:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: testdb
      POSTGRES_HOST_AUTH_METHOD: trust
  notes_app: # the note service
    container_name: notes
    restart: always
    build:
      context: . #
      dockerfile: Dockerfile.notes # the dockerfile to build the note image
    ports:
      - "8080:8080"
    depends_on:
      - db
    extra_hosts: # Linux only configuration
      - host.docker.internal:host-gateway" # Linux only configuration
    environment:
      - DB_HOST=test_postgres # References the Postgres container
      - CALENDAR_HOST=calendar # References the calendar container
      - DD_AGENT_HOST=datadog
  calendar_app: # the calendar service
    container_name: calendar
    restart: always
    build:
      context: .
      dockerfile: Dockerfile.calendar # the dockerfile to build the calendar image
    ports:
      - "9090:9090"
  datadog: # the datadog service
    container_name: dd-agent
    image: "gcr.io/datadoghq/agent:latest"
    environment:
        - DD_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        - DD_SITE=us5.datadoghq.com # Default. Change to eu.datadoghq.com, us3.datadoghq.com, us5.datadoghq.com as appropriate for your org
        - DD_APM_ENABLED=true # Enable APM
    volumes:
        - /var/run/docker.sock:/var/run/docker.sock:ro
        - /proc/:/host/proc/:ro
        - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro

This file allows us to declare 3 services that we will run:

The first service is db which allows us to deploy a PostgreSQL database in order to store the data that will be used by the notes and calendar services which are two other services deployed by Docker-compose.
We have the final Datadog service, which will be the agent used to send traces to our Datadog instance. We pass it connection parameters through environment variables such as the api key with field DD_API_KEY, the Datadog site with field DD_SITE or whether the APM module is enabled on the agent with field DD_APM_ENABLED to which we pass the value true.
We allow the Datadog service to mount read-only volumes (ro for Read Only) on our Datascientest machine so that it can communicate with the Docker socket thanks to the line - /var/run/docker.sock:/var/run/docker.sock:ro. We do the same with the processes of the datadog service and the processes of our Datascientest machine with the line - /proc/:/host/proc/:ro.

We also do the same with the line /sys/fs/cgroup/:/host/sys/fs/cgroup:ro which allows us to mount the CGROUP of the Datascientest machine within our Datadog service.

In the notes_app service section, we have the DD_AGENT_HOST environment variable which allows us to specify the location in which we'll find the Datadog agent, which is none other than the local Docker instance:

environment:
  - DD_AGENT_HOST=host.docker.internal

You must replace the value xxxxxxxxxxxxxxxxxxxxxxxxxxx with your API key in the docker-compose.yml file

The constraint under Linux requires us to also add a extra_host field with the value host.docker.internal to our notes-app service to enable communication over Docker's internal network.

Let's make sure our agent is configured to receive trace data from containers. Let's open the configuration file and make sure that the apm_config: field is uncommented (line 1158 of the file) and that the field below it called enabled is set to true.

Let's also make sure that the apm_non_local_traffic field is uncommented and set to true (line 1184 of the file).

sudo nano /etc/datadog-agent/datadog.yaml

ddtrace is the library we'll need to trace our various applications. It is available for many languages such as Python, PHP, Ruby, Go, Java , etc. Our applications are coded in Python using the well-known Flask framework, which implies that we can install it with pip3 which is therefore our application dependency manager.

For this, we have a file called requirements.txt in which we've declared all the application dependencies we'll need in order to trace our applications. We can see the contents of this file as follows:

cat requirements.txt

Output display:

flask==2.2.2
psycopg2-binary==2.9.6
requests==2.28.1
ddtrace

If an agent is already installed on the host, make sure it is at least version 7.28. The minimum Datadog Agent version required for `ddtrace`. We can find the developer documentation for the ddtrace library at https://ddtrace.readthedocs.io/en/stable/installation_quickstart.html.

We need to restart the Datadog Agent on our machine:

sudo systemctl restart datadog-agent

We can now execute the command to run our Docker-compose stack:

docker-compose up -d

output display:

Creating dd-agent ... done
Creating calendar ... done
Creating test_postgres ... done
Creating notes ... done

We can also check that our containers are running:

docker ps

Output display:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
df29de22722d apm-datadog_notes_app "python -m notes_app..."   58 seconds Up 58 seconds 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp notes
7bcd40630fce gcr.io/datadoghq/agent:latest "/bin/entrypoint.sh" 59 seconds Up 59 seconds (healthy) 8125/udp, 8126/tcp dd-agent
15d9bd218d77 postgres "docker-entrypoint.s..." 59 seconds Up 59 seconds 5432/tcp test_postgres
ac0c459978a3 apm-datadog_calendar_app "python -m calendar_..."   59 seconds Up 58 seconds 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp calendar
b92d0210fa8c datascientest/fastapi:1.0.0 "/bin/sh -c 'uvicorn..."   59 seconds Up 12 days 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp fastapi_sentiment

We can then execute several commands just to send a few traces to our Datadog instance. Let's already run the query that allows us to retrieve the list of notes on the /notes path:

curl -X GET 'localhost:8080/notes'

Output display:

{}

Let's then execute a POST command that allows us to fill in a Datascientest note:

curl -X POST 'localhost:8080/notes?desc=Datascientest'`

Output display:

(1, Datascientest)

We can then check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "datascientest"
}

We can add a new note:

curl -X POST 'localhost:8080/notes?desc=DevUniversity'`

Output display:

(2, DevUniversity)

We can once again check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "Datascientest",
  "2": "DevUniversity"
}

We can add a new note:

curl -X POST 'localhost:8080/notes?desc=CyberUniversity'`

Output display:

(3, CyberUniversity)

We can once again check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "Datascientest",
  "2": "CyberUniversity",
  "3": "DevUniversity"
}

We can add a new note:

curl -X POST 'localhost:8080/notes?desc=Datadog'`

Output display:

(4, Datadog)

We can once again check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "Datascientest",
  "2": "CyberUniversity",
  "3": "DevUniversity",
  "4": "Datadog"
}

We can modify our last entry Datadog with the value Monitoring:

curl -X PUT 'localhost:8080/notes?id=4&desc=Monitoring'`

We can once again check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "Datascientest",
  "2": "CyberUniversity",
  "3": "DevUniversity",
  "4": "Monitoring"
}

We can now delete our last entry:

curl -X DELETE 'localhost:8080/notes?id=4'`

We can once again check the list of our notes:

curl -X GET 'localhost:8080/notes'

Output display:

{
  "1": "Datascientest",
  "2": "CyberUniversity",
  "3": "DevUniversity"
}

the universal service tags identify services tracked in different versions and deployment environments, so they can be correlated in Datadog, and we can use them to search and filter. The three environment variables used for unified service tagging are DD_SERVICE, DD_ENV and DD_VERSION. They are therefore added as environment variables in the Dockerfile of the notes service:

ENV DD_SERVICE="notes"
ENV DD_ENV="test"
ENV DD_VERSION="1.10.0"

We also add Docker tags that correspond to universal service tags. This also allows you to get Docker metrics once our application is running.

LABEL com.datadoghq.tags.service="notes"
LABEL com.datadoghq.tags.env="dev"
LABEL com.datadoghq.tags.version="0.1.0"

We can verify that our agent is running and sending data to Datadog by accessing the search button.

We're going to fill in the Event value in the search field.

Let's click on the first menu in the results to access the Datadog event explorer.

As we can see, events from our machine are indeed sent to the event explorer. All the actions implemented are therefore listened for within our Datadog instance.

We can therefore connect to the APM menu and the Traces sub-menu to view the various traces sent to our Datadog instance.

We can see a list of traces corresponding to your API calls:

On the Traces page, let's click on a tracePOST /notes to display a graph that shows us the duration of each action and what other actions occurred before an action ended. The bar at the top of the graph corresponds to the duration we selected on the previous screen (in this case, the initial entry point into the Notes application).

The width of a bar indicates the time it took to complete. A bar at a lower depth represents an action that completes within the lifetime of a bar at a higher depth.

The graph for a trace is shown at the bottom of the chart.

The graph for a trace of typePOST is therefore as follows:

Similarly, by clicking on a GET trace, we have the following result:

Lesson finished?

Next Exercise

Datadog - Conclusion and evaluation

Module progress : Datadog (EN)