CloudTwin  ROS2 Humble
Digital twin for path and trajectory optimisation
AI intelligence layer

The AI intelligence layer adds smart navigation capabilities on top of Nav2. It consists of three ROS2 nodes and a Foxglove extension panel, all launched together via logic.launch.py.


Overview

┌───────────────────────────────────────────────────────────────────┐
│ AI Intelligence Layer │
│ │
│ ┌────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ crowd_monitor │ │ room_interpreter │ │ speech_node │ │
│ │ │ │ │ │ │ │
│ │ /people_posit. │ │ /room_command │ │ /speech_trigger │ │
│ │ ↓ │ │ ↓ │ │ ↓ │ │
│ │ Gaussian │ │ YAML lookup │ │ Mic recording │ │
│ │ density │ │ ↓ │ │ ↓ │ │
│ │ ↓ │ │ /goal_pose │ │ Whisper STT │ │
│ │ /map_dynamic │ │ → Nav2 │ │ ↓ │ │
│ │ /crowd_density │ │ │ │ /room_command │ │
│ │ → Nav2 │ │ /room_command_ │ │ /speech_status │ │
│ │ → Foxglove │ │ feedback │ │ → Foxglove │ │
│ └────────────────┘ └──────────────────┘ └──────────────────┘ │
└───────────────────────────────────────────────────────────────────┘

crowd_monitor.py : Dynamic obstacle avoidance

Purpose

Makes Nav2 replan around corridors blocked by moving humans. Without this node, Nav2 only sees the static hospital map and obstacles in the lidar's direct line of sight. The crowd monitor gives Nav2 a "bird's eye view" of all human positions across the entire map.

How it works

/map (static, from map_server)
+
/people_positions (PoseArray, from obstacle_spawner at ~10 Hz)
┌──────────────────────────────────────┐
│ crowd_monitor │
│ │
│ 1. Cache the static map (501×1127) │
│ 2. For each human position: │
│ - Compute a 2D Gaussian blob │
│ (sigma = 1.35m, 3σ radius) │
│ - Peak value = density_scale │
│ 3. Sum all Gaussians → density grid │
│ 4. Scale to 0-100 │
│ 5. Where density ≥ threshold → 100 │
│ (= occupied wall in OccGrid) │
│ 6. Overlay onto static map copy │
│ │
│ Publish at 2 Hz: │
│ /map_dynamic → Nav2 replanning │
│ /crowd_density → Foxglove display │
└──────────────────────────────────────┘

Gaussian density field

For each human at position (x, y), the node computes a 2D Gaussian:

density(r, c) = exp( -((r - row)² + (c - col)²) / (2 × σ_px²) )

where σ_px = gaussian_sigma / map_resolution. With sigma = 1.35m and resolution = 0.05m/px, the Gaussian extends approximately 81 pixels (1.35m) from each person's centre.

Cells where the scaled density exceeds lethal_threshold are marked as occupied (value 100) in the OccupancyGrid, identical to a wall. Nav2's costmap treats them as impassable, forcing a global replan through an alternative corridor.

Tuning the wall size

The lethal_threshold parameter controls how far from each person the virtual wall extends:

lethal_threshold Approximate wall radius
50 ~1.2 m
25 ~2.0 m
10 ~2.8 m

A lower threshold creates larger exclusion zones, making Nav2 more conservative.

Parameters (in logic_params.yaml)

crowd_monitor:
ros__parameters:
publish_rate: 2.0 # Hz
gaussian_sigma: 1.35 # metres
density_scale: 100.0 # peak value at person centre
lethal_threshold: 25 # cells above this → wall

Topics

Topic Type Direction Description
/map OccupancyGrid Subscribe Static map from map_server
/people_positions PoseArray Subscribe Human positions from obstacle_spawner
/map_dynamic OccupancyGrid Publish Static map + crowd walls (→ Nav2)
/crowd_density OccupancyGrid Publish Density-only overlay (→ Foxglove)

room_interpreter.py : Natural language destination commands

Purpose

Translates human-readable room names into Nav2 navigation goals. Instead of manually clicking a goal on the map, the user can type "go to urgences" or "salle 101" and the robot navigates there.

How it works

/room_command ("Go to urgences")
┌──────────────────────────────────────────┐
│ room_interpreter │
│ │
│ 1. Strip command prefixes: │
│ "Go to", "Navigate to", "Aller à" │
│ 2. Remove articles: "the", "la", "le" │
│ 3. Search room_registry.yaml: │
│ - Exact alias match │
│ - Substring match │
│ - Partial word overlap │
│ 4. Look up (x, y, orientation) │
│ 5. Publish PoseStamped on /goal_pose │
│ │
│ Feedback → /room_command_feedback │
└──────────────────────────────────────────┘

Room registry format (room_registry.yaml)

rooms:
urgences:
display_name: "Urgences"
x: 5.2
y: -12.8
orientation_z: 0.0
orientation_w: 1.0
aliases:
- "urgences"
- "emergency"
- "er"
- "salle urgences"
salle_101:
display_name: "Salle 101"
x: 3.0
y: 0.0
orientation_z: 0.0
orientation_w: 1.0
aliases:
- "salle 101"
- "room 101"
- "101"

Supported command patterns

The parser handles both English and French commands:

  • "Go to urgences"
  • "Navigate to room 204"
  • "Aller à la salle 101"
  • "Va à urgences"
  • "urgences" (just the room name)

Topics

Topic Type Direction Description
/room_command String Subscribe Text command from Foxglove panel or speech_node
/goal_pose PoseStamped Publish Navigation goal for Nav2
/room_command_feedback String Publish Status text for Foxglove panel

speech_node.py : Voice command processing

Purpose

Enables voice control of the robot. Since the Foxglove extension panel runs in a sandboxed environment that blocks microphone access, the speech processing happens on the Ubuntu machine where the microphone is physically connected.

How it works

Foxglove panel Digital twin computer
┌────────────┐ ┌────────────────────────┐
│ Click 🎤 │── /speech_trigger ──► │ speech_node │
│ │ ("fr" or "en") │ │
│ │ │ 1. Start recording │
│ "Recording"│◄─ /speech_status ──── │ (device_index: 5) │
│ │ ("recording") │ │
│ │ │ 2. Silence detection │
│ │ │ (1.5s threshold) │
│"Transcrib."│◄─ /speech_status ──── │ │
│ │ ("transcribing") │ 3. Whisper transcribe │
│ │ │ (faster-whisper, │
│ "Aller à" │◄─ /speech_status ──── │ base model) │
│ │ ("heard:aller à") │ │
│ │ │ 4. Publish text │
│ │ │ → /room_command │
└────────────┘ └────────────────────────┘

Recording strategy

  • Records in 100 ms chunks from the specified audio input device
  • Monitors RMS amplitude of each chunk
  • Stops recording after silence_duration seconds of consecutive silence (RMS below silence_threshold), but only if some speech was detected first
  • Maximum recording length: max_duration seconds

Audio device selection

The Ubuntu machine may have multiple audio inputs. Use this to identify the correct one:

python3 -c "import sounddevice; print(sounddevice.query_devices())"

Then test each input device:

python3 -c "
import sounddevice as sd
import numpy as np
audio = sd.rec(int(3 * 16000), samplerate=16000, channels=1, dtype='float32', device=5)
print('Speak now...')
sd.wait()
print(f'RMS: {np.sqrt(np.mean(audio**2)):.6f}')
"

Set device_index in logic_params.yaml to the device that shows non-zero RMS when you speak.

Parameters

speech_node:
ros__parameters:
model_size: "base" # Whisper model: tiny, base, small, medium
default_language: "fr" # Fallback if trigger has no language
sample_rate: 16000
max_duration: 8.0 # Max recording time (seconds)
silence_threshold: 0.005 # RMS below this = silence
silence_duration: 1.5 # Seconds of silence to stop recording
device_index: 5 # Audio input device (-1 = system default)

Topics

Topic Type Direction Description
/speech_trigger String Subscribe Language code ("fr"/"en") from Foxglove panel
/room_command String Publish Transcribed text → room_interpreter
/speech_status String Publish UI state: "ready", "recording", "transcribing", "heard:...", "no_speech", "error:..."

Foxglove Room command panel

Purpose

A custom Foxglove extension panel that provides a user interface for sending room commands. It connects the Foxglove browser UI to the ROS2 nodes running on the digital twin's computer.

Features

  • Text input : type a room command and press Enter or click Send
  • Voice input : click the 🎤 button to trigger recording on the Ubuntu microphone via the speech_node
  • Language toggle : switch between French and English for speech recognition
  • Feedback display : shows navigation status (green) or errors (red) from room_interpreter

Building the extension

cd ~/Documents/CloudTwin/Foxglove/foxglove_panels/room-command-panel
npm install
npm install --save-dev @foxglove/extension
npm run package

This produces a .foxe file. Transfer it to the computer running Foxglove and install via Foxglove → Extensions → Install local extension.

Panel communication

┌─────────────────────────────────────────────────────┐
│ Foxglove panel (browser) │
│ │
│ [Text Input] ──► publishes /room_command │
│ [🎤 Button] ──► publishes /speech_trigger │
│ │
│ subscribes /room_command_feedback ──► [Feedback] │
│ subscribes /speech_status ──► [Recording status] │
└─────────────────────────────────────────────────────┘