Skip to content

Datadog

Cloud monitoring and analytics platform for infrastructure, applications, and logs. Provides unified monitoring across servers, databases, tools, and services.

Datadog Agent

Agent Installation

PlatformInstallation Command
Debian/UbuntuDD_API_KEY=<your-api-key> bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
CentOS/RHELDD_API_KEY=<your-api-key> bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"
macOSbrew install datadog-agent
Dockerdocker run -d --name datadog-agent -v /var/run/docker.sock:/var/run/docker.sock:ro -v /proc/:/host/proc/:ro -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro -e DD_API_KEY=<your-api-key> gcr.io/datadoghq/agent:latest
Kuberneteskubectl apply -f https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/dancer-agent.yaml

Agent Commands

COMMANDDESCRIPTION
sudo systemctl start datadog-agentStart Datadog agent
sudo systemctl stop datadog-agentStop Datadog agent
sudo systemctl restart datadog-agentRestart Datadog agent
sudo systemctl status datadog-agentCheck agent status
sudo datadog-agent statusView detailed agent status
sudo datadog-agent configcheckValidate agent configuration
sudo datadog-agent diagnoseRun agent diagnostics
sudo datadog-agent healthCheck agent health
sudo datadog-agent flare <case-id>Collect diagnostic info

Agent Configuration

FileLocationDescription
datadog.yaml/etc/datadog-agent/datadog.yamlMain configuration file
conf.d//etc/datadog-agent/conf.d/Integration configurations

Basic Configuration

yaml
# datadog.yaml
api_key: <your-api-key>
site: datadoghq.com  # or datadoghq.eu for EU

# Enable process monitoring
process_config:
  enabled: true

# Enable logs collection
logs_enabled: true

# Enable trace collection
apm_config:
  enabled: true

# Enable dogstatsd
use_dogstatsd: true
dogstatsd_port: 8125

Integrations Configuration

yaml
# conf.d/nginx.yaml
init_config:

instances:
  - nginx_status_url: http://localhost/nginx_status/
    tags:
      - env:production
      - service:web
yaml
# conf.d/postgres.yaml
init_config:

instances:
  - host: localhost
    port: 5432
    username: datadog
    password: <password>
    dbname: postgres
    tags:
      - env:production

Datadog CLI (dog)

Installation

bash
# Using pip
pip install datadog

# Using Homebrew
brew install datadog-cli

Monitor Commands

COMMANDDESCRIPTION
dog metric post <metric_name> <value>Submit custom metric
dog metric query "avg:system.cpu.user"Query metric
dog event post <title>Post event
dog event post "Deployment" --text "Deployed version 1.0.0"Post event with details
dog tag assign <host_id> "env:production,service:web"Assign tags to host
dog monitor listList all monitors
dog monitor get <monitor_id>Get monitor details
dog monitor create --type <type> --name <name> --query <query>Create monitor
dog monitor delete <monitor_id>Delete monitor
dog downtimes listList scheduled downtimes
dog downtimes createCreate downtime

Monitor Examples

bash
# Create alert monitor
dog monitor create --type metric alert \
  --name "High CPU Usage" \
  --query "avg(last_5m):avg:system.cpu.user{host:web-01} > 80" \
  --message "CPU usage is above 80%" \
  --tags env:production,team:ops

# Create log alert
dog monitor create --type log alert \
  --name "Error Logs" \
  --query "logs(\"service:web status:error\").count() > 10" \
  --message "Too many error logs"

# Create synthetic test
dog monitor create --type synthetics alert \
  --name "Website Availability" \
  --query "average(last_1h):avg:synthetics.check{service:web-01} < 1" \
  --message "Website is down"

Datadog Metrics

Metric Types

TypeDescriptionExample
GaugeValue that can go up or downsystem.cpu.user
RateRate per secondnginx.requests.rate
CountCounterweb.requests.total
HistogramDistribution of valuessql.query.time

Custom Metrics

bash
# Submit custom metric via agent
curl -X POST -H "Content-type: text/json" \
  -d '{"series": [{"metric":"custom.metric", "points":[[1634567890, 42]], "tags":["env:production"]}]' \
  'https://api.datadoghq.com/api/v1/series?api_key=<your-api-key>'

# Submit via dogstatsd
echo "custom.metric:42|g|#env:production" | nc -u -w 1 127.0.0.1 8125

# Submit in code (Python)
from datadog import initialize, statsd

statsd = statsd.Statsd(host='localhost', port=8125)
statsd.gauge('custom.metric', 42, tags=['env:production'])

Datadog Monitors

Monitor Types

TypeDescriptionExample
Metric AlertAlert on metric thresholdavg:system.cpu.user > 80
Log AlertAlert on log patternslogs("error").count() > 10
Process AlertAlert on process statusprocess.name{process_name:nginx}.exists() < 1
Network AlertAlert on network issuesnetwork.http.response_time > 1000
Synthetic AlertAlert on synthetic testsynthetics.check{test:website} < 1
CI MonitorAlert on CI failuresci.pipeline.result{pipeline:deploy} = 0

Monitor Query Syntax

yaml
# Metric alert
avg(last_5m):avg:system.cpu.user{host:web-01} > 80

# Log alert
logs("service:web status:error").count() > 10

# Process alert
process.name{process_name:nginx}.exists() < 1

# Network alert
network.http.response_time{url:https://example.com} > 1000

# Synthetics alert
synthetics.check{test:website} < 1

Monitor Configuration

json
{
  "name": "High CPU Usage",
  "type": "metric alert",
  "query": "avg(last_5m):avg:system.cpu.user{host:web-01} > 80",
  "message": "CPU usage is above 80% on {{host.name}}",
  "tags": ["env:production", "team:ops"],
  "options": {
    "thresholds": {
      "critical": 80,
      "warning": 70
    },
    "notify_no_data": false,
    "notify_audit": false,
    "timeout_h": 60
  }
}

Datadog Logs

Log Collection

yaml
# Enable logs in datadog.yaml
logs_enabled: true

# Collect logs from file
logs:
  - type: file
    path: "/var/log/nginx/access.log"
    service: nginx
    source: nginx

  - type: file
    path: "/var/log/app/application.log"
    service: app
    source: app

Log Queries

QueryDescription
service:nginxLogs from nginx service
status:errorError logs
service:nginx status:errorNginx error logs
service:nginx status:error OR status:warningError or warning logs
@env:productionProduction environment logs
service:nginx @env:productionProduction nginx logs
"timeout"Logs containing "timeout"
`service:nginxcount()`

Log Pipelines

yaml
# Add custom fields
- grok:
    pattern: "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}"
    source: "application"
- add_tags:
    tags:
      - "env:production"
- date:
    match_rules:
      - "timestamp:dd/MMM/yyyy:HH:mm:ss Z"

Datadog APM (Application Performance Monitoring)

APM Configuration

yaml
# datadog.yaml
apm_config:
  enabled: true

# Environment variables
DD_APM_ENABLED=true
DD_SERVICE_NAME=my-app
DD_ENV=production
DD_TRACE_ANALYTICS_ENABLED=true

APM Integration Examples

python
# Python
from ddtrace import tracer

@tracer.wrap(service='my-app', resource='endpoint')
def handle_request():
    pass

# Distributed tracing
from ddtrace import patch_all
patch_all()
javascript
// Node.js
const tracer = require('dd-trace').init({
  service: 'my-app',
  env: 'production',
  logInjection: true
});

app.use(tracer.expressMiddleware());
java
// Java
import datadog.trace.api.DDTags;
import datadog.trace.api.Tracer;

Tracer tracer = GlobalTracer.get();
tracer.buildSpan("operation")
  .withTag(DDTags.SERVICE_NAME, "my-app")
  .start()
  .finish();

Datadog Dashboards

Create Dashboard via CLI

bash
# Create dashboard
dog dashboard create \
  --title "System Overview" \
  --description "Main system metrics" \
  --widgets '[{
    "definition": {
      "type": "timeseries",
      "requests": [{
        "q": "avg:system.cpu.user{host:web-01}"
      }]
    },
    "layout": {"x": 0, "y": 0, "w": 6, "h": 4}
  }]'

Dashboard Widgets

TypeDescription
timeseriesTime series graph
query_valueSingle value
toplistTop values
geomapGeographic map
hostmapHost topology
log_streamLive log stream
noteText note

Dashboard Example

json
{
  "title": "Web Application Dashboard",
  "description": "Application and infrastructure metrics",
  "widgets": [
    {
      "definition": {
        "type": "timeseries",
        "requests": [
          {"q": "avg:system.cpu.user{service:web}", "display_type": "line"}
        ],
        "title": "CPU Usage"
      },
      "layout": {"x": 0, "y": 0, "w": 6, "h": 4}
    },
    {
      "definition": {
        "type": "timeseries",
        "requests": [
          {"q": "avg:nginx.net.request_per_s{service:web}", "display_type": "area"}
        ],
        "title": "Request Rate"
      },
      "layout": {"x": 6, "y": 0, "w": 6, "h": 4}
    }
  ]
}

Datadog Tags

Tag Best Practices

Best PracticeExample
Use consistent namingenv:production, not env:prod
Use lowercaseservice:web, not Service:web
Use hyphens for multi-wordteam:ops, not team:ops_team
Use consistent value typesenv:production, not env:prod
Limit tags per metric~50 tags maximum
Use tag prefixesteam:, env:, service:

Common Tags

TagValuesPurpose
envproduction, staging, devEnvironment separation
serviceweb, api, databaseService identification
teamops, dev, securityTeam responsibility
version1.0.0, 2.1.3Version tracking
regionus-east-1, eu-west-1Geographic location
tierfrontend, backend, dataApplication tier

Datadog Alerts and Notifications

Alert Thresholds

yaml
options:
  thresholds:
    critical: 90
    warning: 70
    critical_recovery: 80
    warning_recovery: 65

Notification Channels

ChannelConfiguration
EmailSpecify email addresses
SlackConfigure Slack webhook
PagerDutyConfigure PagerDuty service key
WebhookPOST to custom endpoint
SMSPhone number (paid feature)

Alert Routing

yaml
# Route to specific team
tags:
  - team:ops
  - severity:critical

# Multiple notifications
notifications:
  - "@webhook-https://hooks.slack.com/services/..."
  - "pagerduty@team-ops"

# Warning vs Critical
options:
  thresholds:
    critical: 90
    warning: 70
notification_channels:
  - slack
  - email

Datadog Synthetics

Synthetic Test Types

TypeDescription
API TestHTTP endpoint monitoring
Browser TestMulti-step browser tests
TCP TestPort and protocol testing
DNS TestDNS resolution monitoring

Create API Test

bash
# Create API synthetic test
dog synthetics create \
  --type api \
  --subtype http \
  --name "API Health Check" \
  --url "https://api.example.com/health" \
  --request_method GET \
  --assertions '[{"operator":"is","type":"statusCode","target":200}]'

Create Browser Test

bash
# Create browser synthetic test
dog synthetics create \
  --type browser \
  --name "Login Test" \
  --config '{"locations":["us-east-1"]}'

Datadog RUM (Real User Monitoring)

RUM Configuration

html
<!-- JavaScript RUM integration -->
<script>
  (function(h,o,u,n,d) {
    h=h[d]=h[d]||{q:[],onReady:function(c){h.q.push(c)}}
    d=o.createElement(u);d.async=1;d.src=n
    n=o.getElementsByTagName(u)[0];n.parentNode.insertBefore(d,n)
  })(window,document,"script","https://www.datadoghq-browser-agent.com/datadog-rum-v4.js","DD_RUM")
  DD_RUM.onReady(function() {
    DD_RUM.init({
      clientToken: '<client-token>',
      applicationId: '<app-id>',
      site: 'datadoghq.com',
      sampleRate: 100,
      trackInteractions: true
    })
  })
</script>

Datadog Integrations

IntegrationDescription
AWSCloudWatch, EC2, S3, RDS
KubernetesPod, node, and service metrics
DockerContainer metrics and logs
NginxWeb server metrics
PostgreSQLDatabase metrics
RedisCache metrics
ElasticsearchSearch engine metrics
JenkinsCI/CD metrics

Integration Setup

bash
# Install integration
# 1. Enable integration in datadog.yaml
# 2. Create config file in conf.d/
# 3. Restart agent

# Example: PostgreSQL integration
# conf.d/postgres.yaml
init_config:

instances:
  - host: localhost
    port: 5432
    username: datadog
    password: <password>
    dbname: postgres
    tags:
      - env:production

Datadog Best Practices

  1. Use Tags Effectively - Organize metrics with tags
  2. Set Appropriate Thresholds - Avoid alert fatigue
  3. Use Dashboards - Visualize metrics proactively
  4. Monitor Key Metrics - Focus on business-critical metrics
  5. Set Up Alerts - Get notified early on issues
  6. Use Synthetics - Monitor from user perspective
  7. Enable APM - Understand application performance
  8. Collect Logs - Correlate logs with metrics
  9. Use RUM - Monitor real user experience
  10. Review Alerts Regularly - Optimize alert rules

Datadog Pricing

Metric TypePricing Model
Host MetricsPer host
Custom MetricsPer custom metric
LogsPer GB ingested and retained
APMPer traced request
SyntheticsPer test run
RUMPer session

Useful Tips

Agent Troubleshooting

bash
# Check agent status
sudo datadog-agent status

# Validate configuration
sudo datadog-agent configcheck

# Run diagnostics
sudo datadog-agent diagnose

# View logs
sudo tail -f /var/log/datadog/agent.log

API Usage

bash
# Submit metric
curl -X POST "https://api.datadoghq.com/api/v1/series?api_key=<your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "series": [{
      "metric": "system.cpu.user",
      "points": [[1634567890, 42.0]],
      "tags": ["host:web-01", "env:production"]
    }]
  }'

# Query metrics
curl -X GET "https://api.datadoghq.com/api/v1/query?query=avg:system.cpu.user&from=1634567890&to=1634654290&api_key=<your-api-key>"

# List monitors
curl -X GET "https://api.datadoghq.com/api/v1/monitor?api_key=<your-api-key>"

Common Issues

IssueSolution
Agent not reportingCheck datadog-agent status and API key
High log volumeFilter logs, reduce log level
False positivesAdjust thresholds, add more context
Slow APM tracesIncrease sampling rate, optimize code
Dashboard lagReduce query time range, limit widgets

Datadog vs Other Monitoring Tools

FeatureDatadogPrometheusNew Relic
SaaS vs Self-HostedSaaSSelf-hostedSaaS
Learning CurveEasyModerateModerate
Integrations400+100+100+
APMYesYes (Jaeger)Yes
LogsYesNeeds Loki/ELKYes
PricingPer hostFreePer user/host
AlertingNativeAlertmanagerNative

Released under MIT License.