Self-Optimization¶
RushTI learns from each run. It tracks how long each TI process takes, detects resource contention patterns, recommends optimal worker counts, and reorders tasks for better parallelization — all from historical execution data.
How It Works¶
- Run your workflow — RushTI records how long each task took in a local SQLite database.
- Build a performance picture — After several runs, RushTI calculates a weighted average duration for each task (recent runs count more).
- Analyze and optimize — Use
rushti stats optimizefor contention-aware analysis, or opt in to runtime scheduling with--optimizeon the CLI.
The result: better worker utilization and shorter total execution time — without changing your task files.
Experimental
Contention-aware optimization is based on statistical analysis of historical execution data. The recommendations follow theoretical reasoning (IQR outlier detection, DAG scheduling, concurrency analysis) but actual TM1 server behavior depends on factors beyond task ordering — server load, memory pressure, concurrent users, and data volumes. There is no one-size-fits-all solution. Test the optimized taskfile in a non-production environment before deploying.
Contention-Aware Optimization¶
The most advanced optimization mode. It analyzes why some tasks are slow, detects resource contention patterns, and restructures the task file to reduce contention.
This produces an optimized task file with:
- Predecessor chains between heavy task groups to prevent them from running simultaneously
- Worker count recommendations based on observed server capacity
- An HTML optimization report with detailed analysis and visualizations
When to Use It¶
Contention-aware optimization is most valuable when:
- Your workflow has tasks that share TM1 server resources (write locks, memory, threads)
- Some tasks are much slower than others due to data volume differences (e.g., one region has 10x more data)
- You suspect that too many concurrent heavy tasks are degrading performance
- You want data-driven guidance on the right
max_workerssetting
How the Algorithm Works¶
The optimizer follows a multi-step decision tree. At each stage, it tries the most specific analysis first and falls back to simpler approaches when the data does not support deeper optimization.
flowchart TD
A[Compute EWMA durations] --> B{Varying<br/>parameters?}
B -- No --> CEIL[Detect ceiling /<br/>scale-up]
B -- Yes --> C{Contention<br/>driver found?}
C -- No --> CEIL
C -- Yes --> D[Group by driver +<br/>IQR outlier detection]
D --> E{≥ 2 heavy<br/>groups?}
E -- Yes --> F[Build predecessor chains +<br/>Recommend workers]
E -- No --> CEIL
F --> G[Refine workers via<br/>ceiling / scale-up]
G --> OUT1([Optimized taskfile<br/>with chains])
CEIL --> H{Signal<br/>found?}
H -- Yes --> OUT2([Optimized taskfile<br/>with ceiling])
H -- No --> FB([Fallback: longest_first<br/>EWMA reordering])
style A fill:#bbdefb,color:#000
style OUT1 fill:#c8e6c9,color:#000
style OUT2 fill:#c8e6c9,color:#000
style FB fill:#ffe0b2,color:#000
The key insight: contention driver and concurrency ceiling are not mutually exclusive. When the algorithm finds a driver and builds chains (Step 7–8), it still runs ceiling/scale-up detection (Step 9) and uses the result to refine the worker count — capping down if the server is overwhelmed, or raising up if scale-up evidence is strong.
Step 1: Compute EWMA Durations¶
Using historical execution data, the optimizer calculates an EWMA (Exponentially Weighted Moving Average) duration for each unique task signature. This produces a reliable estimate of how long each task currently takes.
Step 2: Identify the Contention Driver¶
The optimizer examines which task parameter most strongly correlates with duration variance. For example, if tasks parameterized by pRegion show wildly different durations (NA=10s, EU=15s, APAC=120s) while tasks parameterized by pYear are similar, then pRegion is the contention driver.
The algorithm groups tasks by each varying parameter and measures the range of average group durations. The parameter with the largest range wins — it explains the most variance.
Step 3: Detect Heavy Outliers (IQR)¶
Once the contention driver is identified, the optimizer groups tasks by their driver value and uses IQR (Interquartile Range) statistics to detect heavy outliers — groups whose average duration is significantly above the rest.
Groups above the upper fence are classified as "heavy." The sensitivity parameter k (default: 10.0) controls how conservative the detection is. Higher values mean fewer groups are flagged as heavy.
Step 4: Build Predecessor Chains¶
Heavy groups are chained together with predecessor dependencies so they run sequentially instead of simultaneously. This prevents multiple resource-intensive task groups from competing for TM1 server resources at the same time. At least 2 heavy groups are required to build chains.
If the workflow has fan-out parameters (e.g., tasks that vary by both pRegion and pYear), independent chains are created for each fan-out value, allowing parallelism across the non-contention dimensions.
Step 5: Recommend Worker Count¶
The optimizer calculates an optimal max_workers based on the chain structure:
Where chain_slots is the number of independent chains (fan-out size) and light_work accounts for non-heavy tasks that can run in parallel alongside the chains.
Step 6: Detect Concurrency Ceiling / Scale-Up¶
The optimizer also examines runs at different worker levels to detect:
- Concurrency ceiling: More workers made things slower (server overwhelmed). Recommends reducing workers.
- Scale-up opportunity: More workers made things faster and the current run uses fewer than optimal. Recommends increasing workers.
For scale-up recommendations, the optimizer uses a sweet spot algorithm: among all worker levels within 10% of the best wall clock time, it picks the one with the fewest workers. This avoids recommending far more workers when a smaller count achieves nearly the same speed.
This step runs in three contexts: (1) as a standalone fallback when no varying parameters or driver is found, (2) as a standalone fallback when < 2 heavy groups are detected, and (3) as a refinement after chains are built — where the ceiling/scale-up result either caps down or raises up the driver-based worker recommendation.
Automatic Fallback¶
If none of the above steps produce an actionable optimization (no driver, no ceiling, or a driver with insufficient heavy groups), the optimizer automatically falls back to standard longest_first EWMA reordering — the same analysis that rushti stats analyze produces. You always get an optimized output file.
CLI Usage¶
# Basic: analyze a workflow (auto-resolves taskfile from archive)
rushti stats optimize --workflow daily-refresh
# Specify input taskfile and output path
rushti stats optimize \
--workflow daily-refresh \
--tasks daily-refresh.json \
--output daily-refresh-optimized.json
# Adjust sensitivity (higher = more conservative outlier detection)
rushti stats optimize --workflow daily-refresh --sensitivity 15.0
# Analyze more historical runs with custom EWMA alpha
rushti stats optimize --workflow daily-refresh --runs 20 --ewma-alpha 0.5
# Skip HTML report generation
rushti stats optimize --workflow daily-refresh --no-report
# Custom report output path
rushti stats optimize --workflow daily-refresh --report-output analysis.html
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--workflow |
-W |
STR | required | Workflow name to analyze |
--tasks |
-t |
PATH | (auto) | Input task file. If omitted, uses the archived taskfile from the most recent run. |
--output |
-o |
PATH | (auto) | Output path for optimized task file. Default: <taskfile>_optimized.json |
--sensitivity |
FLOAT | 10.0 |
IQR multiplier for outlier detection. Higher = more conservative. | |
--runs |
-n |
INT | 10 |
Number of recent runs for EWMA estimation |
--ewma-alpha |
FLOAT | 0.3 |
EWMA smoothing factor (0--1). Higher = more weight on recent runs. | |
--no-report |
FLAG | false |
Skip generating the HTML optimization report | |
--report-output |
PATH | (auto) | Output path for HTML report. Default: alongside the optimized taskfile. | |
--settings |
-s |
PATH | auto | Path to settings.ini |
Reading the Output¶
The command prints a summary to the terminal:
Contention Analysis for: daily-refresh
==================================================
Contention driver: pRegion
Fan-out parameters: pYear
Fan-out size: 3
Total tasks: 24
IQR statistics (sensitivity k=10.0):
Q1: 12.3s
Q3: 45.6s
IQR: 33.3s
Upper fence: 378.6s
Heavy groups (2):
APAC: 420.5s (3 tasks)
NA: 385.2s (3 tasks)
Light groups (6):
EU: 45.6s (3 tasks)
LATAM: 38.1s (3 tasks)
... and 4 more
Chain structure:
Chain length: 2 heavy groups
Chains: 3 independent chains
Tasks with predecessors: 6
Critical path: 805.7s
Recommended max_workers: 5
If a concurrency ceiling or scale-up opportunity is detected, additional information is displayed:
Scale-up opportunity detected: recommend increasing to 10 workers
5 workers: 734s wall clock (slowest)
10 workers: 581s wall clock (fastest)
Potential improvement: 153s (20.8%)
HTML Optimization Report¶
By default, rushti stats optimize generates an interactive HTML report alongside the optimized task file. The report includes:
- Analysis summary — contention driver, heavy/light group counts, recommended workers
- Contention driver analysis — bar chart comparing average durations per group
- IQR statistics — visual breakdown of the outlier detection thresholds
- Chain structure — how heavy groups are sequenced
- Concurrency analysis — worker level comparison with wall clock times and efficiency metrics
- DAG visualization — the optimized dependency graph with predecessor chains highlighted
The report is a self-contained HTML file (all CSS, JavaScript, and data embedded inline) that opens in any browser.
Example: Before and After¶
Before — all tasks run in parallel, heavy regions compete for TM1 resources:
Worker 1: [APAC-2026 420s] ──────────────────────────
Worker 2: [NA-2026 385s] ──────────────────────
Worker 3: [APAC-2025 415s] ──────────────────────────
Worker 4: [NA-2025 380s] ──────────────────────
↑ Heavy tasks compete for write locks
After — heavy groups are chained, light tasks fill gaps:
Worker 1: [APAC-2026 420s] ────────── → [NA-2026 385s] ──────────
Worker 2: [APAC-2025 415s] ────────── → [NA-2025 380s] ──────────
Worker 3: [EU-2026 46s] [LATAM-2026 38s] [EU-2025 44s] [LATAM-2025 36s]
↑ Only one heavy group runs at a time per chain
Scheduling Algorithms¶
For simpler workloads where contention-aware analysis is not needed, RushTI supports runtime scheduling algorithms that reorder ready tasks by estimated duration during execution.
| Algorithm | Sort Order | Best For |
|---|---|---|
longest_first |
Longest tasks start first | Independent workloads with varied task durations. Starts expensive tasks early so short tasks fill gaps. |
shortest_first |
Shortest tasks start first | Shared-resource TM1 workloads where concurrent heavy tasks cause contention (memory, threads, write locks). |
When to Use Which Algorithm¶
longest_first— Classic bin-packing heuristic. Works well when tasks are independent and the TM1 server has headroom. Ideal for read-heavy extract workflows targeting multiple instances.shortest_first— Reduces peak resource pressure by completing quick tasks first, lowering the number of concurrent heavy tasks. Ideal for write-heavy loads against a single TM1 instance.
Not sure? Start with shortest_first
Most TM1 workloads involve shared server resources. shortest_first is a safer default for typical Planning Analytics environments.
Enable Scheduling¶
Scheduling is off by default. You opt in explicitly per-run or per-taskfile.
Via CLI¶
# Shortest-first scheduling (good for shared-resource TM1 workloads)
rushti run --tasks daily-refresh.json --max-workers 20 --optimize shortest_first
# Longest-first scheduling (good for independent tasks with varied durations)
rushti run --tasks daily-refresh.json --max-workers 20 --optimize longest_first
Via JSON Task File¶
Set a default algorithm in the task file's settings section:
CLI --optimize overrides the JSON setting. Omitting both means no optimization.
EWMA Tuning (settings.ini)¶
The EWMA parameters that control how runtime estimates are calculated live in settings.ini. These are system-wide and rarely changed:
What's happening under the hood
When multiple tasks are ready to execute (i.e., all their predecessors have completed), RushTI sorts them by estimated runtime using the chosen algorithm before assigning them to workers. Dependencies are never changed; only the order among independent, ready tasks is affected.
Requirements¶
[stats] enabled = true— Optimization needs historical data. Without stats, there's nothing to learn from.- At least
min_samplesruns — Tasks with fewer runs thanmin_samplesuse a default estimate and are placed after tasks with known durations.
Before and After¶
Consider a simple workflow with six independent tasks and two workers:
Before optimization (random order):
Worker 1: [Short 2s] [Short 3s] [Long 30s] ──────── Total: 35s
Worker 2: [Short 1s] [Long 25s] ───── [Short 4s] Total: 30s
Overall: 35s
Worker 1 finishes early tasks quickly, then hits the long task last. Meanwhile Worker 2 sits idle waiting.
After optimization (longest first):
Worker 1: [Long 30s] ──────────── [Short 2s] [Short 1s] Total: 33s
Worker 2: [Long 25s] ───── [Short 4s] [Short 3s] Total: 32s
Overall: 33s
Both workers stay busy throughout. The total time drops because long tasks start immediately and short tasks fill in the remaining gaps.
Real-World Gains
The improvement depends on how unbalanced your tasks are. Workflows with a mix of 5-second and 5-minute tasks see the biggest gains. Workflows where all tasks take roughly the same time see less benefit.
Configuration¶
| Setting | Location | Default | Description |
|---|---|---|---|
--optimize |
CLI | (none) | Scheduling algorithm: longest_first or shortest_first. No flag = no optimization. |
optimization_algorithm |
JSON taskfile | (none) | Per-taskfile default algorithm. Overridden by CLI --optimize. |
lookback_runs |
settings.ini | 10 |
Number of recent runs to analyze for EWMA estimates |
min_samples |
settings.ini | 3 |
Minimum successful runs before optimization kicks in |
cache_duration_hours |
settings.ini | 24 |
How long to cache duration estimates between runs |
time_of_day_weighting |
settings.ini | false |
Weight runs at similar times of day higher (disables caching) |
How Duration Estimates Work (EWMA)¶
RushTI uses EWMA (Exponentially Weighted Moving Average) to estimate task durations:
- Recent runs matter more. If a process used to take 10 seconds but now takes 20 seconds (because data volumes grew), the estimate adjusts toward 20 seconds.
- Old outliers fade away. A one-time spike from a server hiccup does not permanently distort the estimate.
- The smoothing factor (alpha) is 0.3. Each new run contributes 30% to the estimate, and the accumulated history contributes 70%.
For most TM1 environments, the default alpha works well. If your process durations change rapidly (for example, during a data migration), see Performance Tuning for EWMA tuning details.
Time-of-Day Weighting¶
Some TI processes run faster at night when the server is idle and slower during business hours. Enable time_of_day_weighting to give more weight to runs that happened at a similar time of day:
Note
When time_of_day_weighting is enabled, caching is disabled — RushTI recalculates estimates fresh each run to account for the current time of day.
Choosing min_samples¶
| Workflow Type | Recommended min_samples |
Reason |
|---|---|---|
| Stable daily ETL | 3 |
Durations are consistent |
| Variable workloads | 5-10 |
Need more data points to smooth out variation |
| Seasonal processes | 10-20 |
Data volumes change significantly |
Manual Analysis (Optional)¶
In addition to automatic optimization, RushTI includes a manual analysis tool for generating reports and optimized task files:
rushti stats analyze \
--workflow daily-refresh \
--tasks daily-refresh.json \
--output daily-refresh-optimized.json
This reads your execution history, calculates EWMA estimates, and writes a new task file with tasks reordered for optimal execution. This is useful for:
- Reviewing performance — see which tasks are getting slower or faster
- Sharing an optimized file — distribute a pre-optimized task file to a team that hasn't enabled automatic optimization
- Auditing — compare the original and optimized order to understand what changed
What the Report Shows¶
| Field | Description |
|---|---|
task_id |
The task identifier from your task file |
avg_duration |
Simple average across all analyzed runs |
ewma_duration |
Weighted average (recent runs matter more) |
min_duration |
Fastest observed execution |
max_duration |
Slowest observed execution |
run_count |
Number of runs analyzed |
Reading the results:
- EWMA < Average — The task is getting faster over time
- EWMA > Average — The task is getting slower (investigate data volumes or TI process performance)
- EWMA ≈ Average — Stable performance
TM1 Integration¶
If you have TM1 Integration enabled (push_results = true), optimization data is also visible in TM1. The rushti cube stores task durations for every run, so you can build Planning Analytics dashboards that show:
- Which tasks are getting slower over time
- Which workflows benefit most from optimization
- Historical duration trends per TI process
- Success rates and failure patterns
Frequently Asked Questions¶
Does optimization change my task dependencies?¶
No. Optimization only reorders tasks that are independent of each other (tasks at the same depth in the DAG with no shared predecessors). Your dependency chains are never modified.
How many runs do I need before optimization helps?¶
At least 3 runs (the min_samples default). With fewer runs, the duration estimates are unreliable and RushTI skips optimization for those tasks. For best results, aim for 5-10 runs.
What is the difference between stats analyze and stats optimize?¶
stats analyze— Reorders tasks by EWMA-estimated duration (longest first). Useful for basic bin-packing optimization.stats optimize— Performs contention-aware analysis: detects which parameter drives duration variance, identifies heavy outlier groups, builds predecessor chains to prevent heavy tasks from running simultaneously, and recommends an optimal worker count. Produces an HTML report.
Use stats analyze for simple duration-based reordering. Use stats optimize when you suspect resource contention is the bottleneck.
Which scheduling algorithm should I use?¶
shortest_first— Best for most TM1 workloads, especially write-heavy processes against a single TM1 instance. Reduces peak resource contention.longest_first— Best for independent, read-heavy workloads spread across multiple TM1 instances. Classic bin-packing heuristic.
When in doubt, start with shortest_first.
Can I set a default algorithm per task file?¶
Yes. Add "optimization_algorithm" to the JSON task file's settings section. The CLI --optimize flag overrides this value at runtime:
Can I optimize a TXT task file?¶
Yes. Optimization works with both JSON and TXT task files. Use --optimize <algorithm> on the CLI. The stats analyze command also works with both formats.
What if I add new tasks to the workflow?¶
New tasks will not have historical duration data, so RushTI treats them as unknown and places them after tasks with known durations. After a few runs, the new tasks accumulate enough data to be included in optimization.
Can I use optimization with exclusive mode?¶
Yes. Optimization and exclusive mode are independent features:
How does the sweet spot algorithm work for worker recommendations?¶
When the optimizer detects that more workers led to faster execution (scale-up), it does not simply recommend the highest worker count observed. Instead, it finds all worker levels within 10% of the best wall clock time and picks the one with the fewest workers. This avoids recommending 50 workers when 10 workers achieves 95% of the same speed.
What does "sensitivity" control?¶
The --sensitivity flag (default: 10.0) is the IQR multiplier k used for outlier detection. The upper fence is Q3 + k * IQR. Higher values make detection more conservative (fewer groups flagged as heavy). Lower values flag more groups as heavy. For most workloads, the default works well.
Are contention driver and concurrency ceiling mutually exclusive?¶
No. When the algorithm finds a contention driver and builds predecessor chains, it also runs concurrency ceiling / scale-up detection to refine the worker count. The two signals work together: the driver produces chains and a base worker recommendation, the ceiling/scale-up adjusts that recommendation based on observed server capacity across runs.
What happens when contention analysis finds nothing?¶
The optimizer always produces an output. If no contention driver or concurrency ceiling is found, it automatically falls back to standard longest_first EWMA reordering — the same analysis that rushti stats analyze produces. Similarly, if a driver is found but fewer than 2 heavy groups are detected (so no chains can be built) and no ceiling signal exists, the fallback kicks in.
Best Practices¶
-
Enable stats from the start. Even if you do not plan to optimize immediately, having historical data ready is valuable. It costs almost nothing in overhead.
-
Let it learn. After enabling optimization, the first few runs build the performance picture. Gains become visible after
min_samplesruns. -
Combine with manual tuning. Optimization reorders tasks but does not make individual TI processes faster. If the stats show a process taking 5 minutes, consider optimizing the TI process itself (better MDX, fewer loops, targeted data clears).
-
Use
time_of_day_weightingfor mixed schedules. If the same workflow runs both during business hours and overnight, enable time-of-day weighting to account for load differences. -
Run contention analysis after worker tuning. After you have experimented with different
max_workersvalues (see Performance Tuning), runrushti stats optimizeto get a data-driven recommendation and detect resource contention patterns. -
Vary worker counts across runs. The concurrency ceiling and scale-up detection require runs at different
max_workersvalues. Run your workflow at 2--3 different worker levels to give the optimizer enough data for multi-run comparison. -
Test before deploying. Contention-aware optimization follows statistical reasoning, but TM1 server behavior can be unpredictable. Always validate the optimized taskfile in a test or staging environment before using it in production.
Customize Further¶
- Performance Tuning — EWMA alpha tuning, worker sizing, and advanced optimization strategies
- Statistics & Dashboards — The stats database that powers optimization
- Settings Reference — Complete
[optimization]and[stats]settings