Supervisor
The supervisor is the orchestrator of your Ductwork infrastructure. As the parent process for each bin/ductwork instance, it launches and monitors all child processes, ensuring your pipelines continue running even when individual processes fail.
Overview
When you run bin/ductwork, you start a supervisor process that:
- Launches one pipeline advancer process
- Launches one job worker process for each configured pipeline
- Monitors all child processes through heartbeat checks
- Automatically restarts failed or hung processes
- Coordinates graceful shutdown across all processes
The supervisor acts as the resilience layer, making Ductwork pipelines fault-tolerant without manual intervention.
Process Hierarchy
Each Ductwork instance creates the following process tree:
supervisor (bin/ductwork)
├── pipeline advancer
│ └── thread for Pipeline A
│ └── thread for Pipeline B
│ └── thread for Pipeline C
├── job worker (Pipeline A)
│ └── worker thread 1
│ └── worker thread 2
│ └── ...
├── job worker (Pipeline B)
│ └── worker thread 1
│ └── worker thread 2
│ └── ...
└── job worker (Pipeline C)
└── worker thread 1
└── worker thread 2
└── ...
Responsibilities
Process Lifecycle Management
The supervisor manages the complete lifecycle of child processes:
Startup:
- Read configuration from YAML file
- Fork the pipeline advancer process
- Fork one job worker process per configured pipeline
- Register signal handlers for graceful shutdown
- Enter monitoring loop
Monitoring:
- Check heartbeats from each child process
- Track process health and uptime
- Detect crashes or hangs
- Log process status changes
Recovery:
- Automatically restart failed processes
- Maintain pipeline availability during failures
- Preserve pipeline state through process restarts
Shutdown:
- Forward shutdown signals to all children
- Wait for graceful shutdown with timeout
- Terminate unresponsive processes
- Clean up resources and exit
Heartbeat Monitoring
The supervisor continuously monitors child process health through periodic heartbeats. Each child process reports its status at regular intervals, confirming it’s alive and processing work.
Detection: If a child process fails to report a heartbeat within 5 minutes—indicating a crash, hang, or deadlock—the supervisor detects the failure.
Recovery: The supervisor immediately spawns a replacement process to restore full pipeline capacity. The new process picks up where the previous one left off, resuming work on pending jobs.
Why 5 minutes? This timeout balances quick failure detection with tolerance for legitimately slow operations. Steps should typically complete in seconds, but this buffer accounts for temporarily degraded performance without false positives.
Configuration
The supervisor’s behavior is controlled through config/ductwork.yml:
pipelines
Specifies which pipelines to run. The supervisor creates child processes based on this configuration.
default: &default
pipelines:
- EnrichUserDataPipeline
- ProcessOrdersPipeline
Or use the wildcard to run all defined pipelines:
default: &default
pipelines: "*"
Note: The supervisor creates one advancer and one job worker per pipeline listed here.
supervisor.polling_timeout
How long (in seconds) the supervisor sleeps between heartbeat checks.
Default: 1 second
default: &default
supervisor:
polling_timeout: 5
Tuning: Shorter intervals provide faster failure detection but increase CPU usage. Longer intervals reduce overhead but delay failure detection. The default (1 second) works well for most applications.
supervisor.shutdown_timeout
Maximum time (in seconds) to wait for child processes to shut down gracefully. After this timeout, remaining processes receive SIGKILL and terminate immediately.
Default: 30 seconds
default: &default
supervisor:
shutdown_timeout: 45
Important: This value should be larger than job_worker.shutdown_timeout to allow proper cascading. If the supervisor timeout is too short, workers won’t have time to finish their shutdown sequence.
Recommended values:
job_worker.shutdown_timeout: 20 secondssupervisor.shutdown_timeout: 30 seconds (gives 10 seconds buffer)
Signal Handling
The supervisor responds to Unix signals for control and debugging:
TERM and INT - Graceful Shutdown
Triggers the graceful shutdown sequence:
- Supervisor forwards signal to all child processes
- Child processes begin their shutdown sequences
- Supervisor waits up to
supervisor.shutdown_timeoutseconds - Processes still alive after timeout are killed with
SIGKILL - Supervisor exits
# Send TERM signal
kill -TERM <supervisor_pid>
# Or INT signal (both behave identically)
kill -INT <supervisor_pid>
See Signal Handling for detailed shutdown behavior.
TTIN - Thread Backtrace Dump
Requests thread backtraces from all child processes for debugging hung or slow processes.
kill -TTIN <supervisor_pid>
The supervisor forwards this signal to all children, which dump their thread backtraces to the configured logger. This is invaluable for diagnosing performance issues or deadlocks in production.
See TTIN Signal Handling for details.
Lifecycle Hooks
Register actions to run when the supervisor starts or stops:
# config/initializers/ductwork.rb
Ductwork.on_supervisor_start do
Rails.logger.info "Ductwork supervisor starting"
# Initialize monitoring, notify deployment tracking, etc.
end
Ductwork.on_supervisor_stop do
Rails.logger.info "Ductwork supervisor shutting down"
# Flush metrics, notify monitoring systems, etc.
end
These hooks run once per supervisor lifecycle—at the very beginning of startup and the very end of shutdown. Use them for initialization, cleanup, or integration with external systems.
See Lifecycle Hooks for all available hooks.
Monitoring
Track supervisor health and behavior by monitoring:
Process Metrics
- Supervisor uptime
- Number of child process restarts
- Child process spawn rate
- Failed startup attempts
Resource Usage
- Supervisor CPU and memory usage
- Total memory across all child processes
- Open file descriptors
- Database connection count
Heartbeat Status
- Time since last heartbeat from each child
- Heartbeat check frequency
- Missed heartbeat count
Shutdown Behavior
- Time to complete graceful shutdown
- Number of processes killed after timeout
- Shutdown success rate
Running Multiple Supervisors
You can run multiple bin/ductwork instances to isolate pipelines or scale horizontally:
Isolate Critical Pipelines
# Critical pipelines with dedicated resources
bin/ductwork -c config/ductwork.critical.yml
# Background pipelines on separate instance
bin/ductwork -c config/ductwork.background.yml
# config/ductwork.critical.yml
production:
pipelines:
- ProcessPaymentsPipeline
- SendNotificationsPipeline
job_worker:
worker_count: 20
# config/ductwork.background.yml
production:
pipelines:
- GenerateReportsPipeline
- CleanupDataPipeline
job_worker:
worker_count: 5
Scale Across Machines
Run separate supervisors on different servers for horizontal scaling:
# Server 1 - Handle user-facing pipelines
bin/ductwork -c config/ductwork.user_facing.yml
# Server 2 - Handle batch processing pipelines
bin/ductwork -c config/ductwork.batch.yml
Benefits:
- Fault isolation (one failing pipeline doesn’t affect others)
- Resource allocation (dedicate CPU/memory to specific pipelines)
- Independent scaling (scale critical pipelines without scaling everything)
- Deployment flexibility (deploy changes to specific pipeline groups)
Considerations:
- More operational complexity
- Higher total resource usage (overhead per supervisor)
- Need coordination for monitoring across instances
Process Management
Integrate Ductwork with your process manager:
systemd
# /etc/systemd/system/ductwork.service
[Unit]
Description=Ductwork Pipeline Supervisor
After=network.target postgresql.service
[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp
ExecStart=/var/www/myapp/bin/ductwork -c config/ductwork.yml
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Docker
# Dockerfile
CMD ["bin/ductwork", "-c", "config/ductwork.production.yml"]
Kubernetes
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ductwork
spec:
replicas: 2
template:
spec:
containers:
- name: ductwork
image: myapp:latest
command: ["bin/ductwork"]
args: ["-c", "config/ductwork.yml"]
The supervisor’s resilient design makes it suitable for containerized environments and orchestration platforms.