Locking Settings

Configure migration locking to prevent concurrent execution

Table of contents

  1. Overview
  2. config.locking
  3. locking.enabled
  4. locking.timeout
  5. locking.retryAttempts
    1. When to Use Retries
  6. locking.retryDelay
  7. locking.tableName
  8. Configuration Patterns
    1. Production (Recommended)
    2. Scheduled Jobs
    3. Development
    4. CI/CD
  9. CLI Override
  10. Lock Flow Diagram
  11. Implementation Requirements
    1. Initialization Methods (v0.8.1+)
      1. initLockStorage()
      2. ensureLockStorageAccessible()
  12. ILockingHooks Interface (v0.8.1)
    1. Overview
    2. Interface Definition
    3. Usage Example: Metrics Collection
    4. Usage Example: Slack Alerts
    5. Usage Example: Audit Logging
    6. Integration with Handler
  13. LockingOrchestrator Service (v0.8.1)
    1. Overview
    2. Architecture
    3. What It Does
      1. 1. Retry Logic
      2. 2. Two-Phase Locking
      3. 3. Hook Invocation
    4. Automatic Integration
    5. Configuration
    6. Benefits for Adapters
    7. Error Handling
  14. Troubleshooting
    1. Lock Stuck After Crash
    2. Lock Acquired But Ownership Verification Failed
    3. Multiple MSR Instances Conflicting

Overview

Locking settings prevent multiple processes from running migrations simultaneously, which can cause:

  • Race Conditions: Two processes applying the same migration
  • Corrupted State: Migration tracking table with inconsistent data
  • Data Loss: Conflicting schema changes applied out of order
  • Production Incidents: Hard-to-debug issues from concurrent execution

MSR uses database-level locking similar to Knex.js and Liquibase, making it one of the few Node.js migration tools with built-in concurrency protection.

Opt-In Feature (v0.8.0): Locking is disabled by default for backwards compatibility. Enable it by adding a lockingService to your handler. Once enabled, locking is active by default (recommended for production).


config.locking

Type: LockingConfig Default: new LockingConfig() (enabled with fail-fast defaults)

Configuration object for migration locking behavior.

import { Config, LockingConfig } from '@migration-script-runner/core';

const config = new Config();

// Default: Enabled with fail-fast (recommended for production)
config.locking = new LockingConfig({
  enabled: true,
  timeout: 600_000,      // 10 minutes
  retryAttempts: 0,      // fail immediately
  retryDelay: 1000,      // 1 second (if retries enabled)
  tableName: 'migration_locks'
});

locking.enabled

Type: boolean Default: true

Whether migration locking is enabled.

// Production: Keep enabled
config.locking.enabled = true;

// Development: Can disable for faster iteration
config.locking.enabled = false;

// CI/CD: Keep enabled to prevent parallel build issues
config.locking.enabled = true;

Never disable in production. Concurrent migrations can corrupt your database and cause production incidents.


locking.timeout

Type: number (milliseconds) Default: 600_000 (10 minutes)

Maximum time a lock can be held before automatic expiration.

// Short migrations (< 1 min)
config.locking.timeout = 300_000;  // 5 minutes

// Medium migrations (1-5 min) - default
config.locking.timeout = 600_000;  // 10 minutes

// Long migrations (5-30 min)
config.locking.timeout = 1_800_000;  // 30 minutes

// Very long migrations
config.locking.timeout = 3_600_000;  // 1 hour (max recommended)

Set timeout longer than your longest migration. If a lock expires during a valid migration, you risk concurrent execution.

Too long: Stale locks take longer to auto-cleanup. Too short: Risk of lock expiring during valid migration.


locking.retryAttempts

Type: number Default: 0 (fail immediately)

Number of times to retry lock acquisition before failing.

// Fail fast (default) - recommended for most cases
config.locking.retryAttempts = 0;

// Retry with patience
config.locking.retryAttempts = 5;  // Try 5 times

// Very patient
config.locking.retryAttempts = 10;  // Try 10 times

When to Use Retries

✅ Enable retries when:

  • Scheduled jobs that can wait
  • Batch processing systems
  • Non-critical automated deployments

❌ Don’t use retries when:

  • CI/CD pipelines (want fast feedback)
  • Manual deployments (user is waiting)
  • Critical production deployments

Fail fast by default. Immediate feedback is better than waiting. Use retries only for automated systems that can afford to wait.


locking.retryDelay

Type: number (milliseconds) Default: 1000 (1 second)

Delay between retry attempts (only used when retryAttempts > 0).

config.locking.retryAttempts = 5;
config.locking.retryDelay = 2000;  // Wait 2 seconds between retries
// Total max wait: 5 × 2 seconds = 10 seconds

Guidelines:

  • Quick check: 1000ms (1 second)
  • Standard: 2000ms (2 seconds)
  • Polite: 5000ms (5 seconds)

locking.tableName

Type: string Default: 'migration_locks'

Database table name for storing locks.

// Default
config.locking.tableName = 'migration_locks';

// Custom name
config.locking.tableName = 'app_migration_locks';

// Multiple MSR instances in same database
config.locking.tableName = 'app1_migration_locks';

Multiple MSR instances: Use different table names to allow multiple MSR instances to use the same database without lock conflicts.


Configuration Patterns

Fail-fast with automatic cleanup:

config.locking = new LockingConfig({
  enabled: true,
  timeout: 600_000,      // 10 minutes
  retryAttempts: 0,      // fail immediately
  retryDelay: 1000
});

Why:

  • ✅ Immediate feedback if another migration is running
  • ✅ Auto-cleanup after 10 minutes if process crashes
  • ✅ No waiting in deployment pipelines

Scheduled Jobs

Patient retry with longer timeout:

config.locking = new LockingConfig({
  enabled: true,
  timeout: 1_800_000,    // 30 minutes
  retryAttempts: 10,     // retry 10 times
  retryDelay: 5000       // wait 5 seconds between retries
});

Why:

  • ✅ Can wait up to 50 seconds for lock (10 × 5s)
  • ✅ Longer timeout for scheduled migrations
  • ✅ Better for automated systems

Development

Disable for faster iteration:

config.locking = new LockingConfig({
  enabled: false
});

Why:

  • ✅ Faster local development workflow
  • ✅ No lock conflicts during testing
  • ⚠️ NEVER use in production

CI/CD

Fail-fast with shorter timeout:

config.locking = new LockingConfig({
  enabled: true,
  timeout: 300_000,      // 5 minutes
  retryAttempts: 0,      // fail immediately
  retryDelay: 1000
});

Why:

  • ✅ Fast feedback if parallel builds conflict
  • ✅ Prevents multiple CI jobs from running migrations
  • ✅ Shorter timeout appropriate for CI migrations

CLI Override

Disable locking for a single run:

# Temporarily disable locking
msr migrate --no-lock

# Check lock status
msr lock:status

# Force release stuck lock
msr lock:release --force

See Lock Commands for full CLI documentation.


Lock Flow Diagram

┌─────────────────────────────────────────────────────────┐
│ Migration Start                                         │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
         ┌───────────────┐
         │ Generate ID   │
         │ host-pid-uuid │
         └───────┬───────┘
                 │
                 ▼
    ┌────────────────────────┐
    │ Clean Expired Locks    │
    └────────┬───────────────┘
             │
             ▼
    ┌────────────────────┐          ┌──────────────┐
    │ Try Acquire Lock   │─────No──▶│ Retry?       │
    └────────┬───────────┘          └──────┬───────┘
             │Yes                          │Yes
             ▼                             │
    ┌────────────────────┐                │
    │ Verify Ownership   │◀───────────────┘
    └────────┬───────────┘
             │Valid
             ▼
    ┌────────────────────┐
    │ Run Migrations     │
    └────────┬───────────┘
             │
             ▼
    ┌────────────────────┐
    │ Release Lock       │
    │ (always, finally)  │
    └────────────────────┘

Implementation Requirements

To use locking, your database handler must implement ILockingService:

import { ILockingService, ILockStatus } from '@migration-script-runner/core';

class MyLockingService implements ILockingService<MyDB> {
  async acquireLock(executorId: string): Promise<boolean> {
    // Attempt to acquire lock using SELECT FOR UPDATE NOWAIT or equivalent
    // Return true if acquired, false if already locked
  }

  async releaseLock(executorId: string): Promise<void> {
    // Release lock held by this executor
  }

  async verifyLockOwnership(executorId: string): Promise<boolean> {
    // Verify this executor still owns the lock
    // Prevents race conditions
  }

  async getLockStatus(): Promise<ILockStatus | null> {
    // Return current lock information
  }

  async forceReleaseLock(): Promise<void> {
    // Unconditionally release any lock
    // Used by CLI lock:release command
  }

  async checkAndReleaseExpiredLock(): Promise<void> {
    // Clean up locks past their timeout
  }

  // NEW in v0.8.1: Required initialization methods
  async initLockStorage(): Promise<void> {
    // Create lock storage (tables, indexes, paths)
    // Example: CREATE TABLE IF NOT EXISTS migration_locks (...)
    // Throws on setup failures for fail-fast behavior
  }

  async ensureLockStorageAccessible(): Promise<boolean> {
    // Pre-flight check for storage accessibility
    // Returns true if accessible, false otherwise
    // Example: SELECT 1 FROM migration_locks LIMIT 1
  }
}

// Add to handler
handler.lockingService = new MyLockingService(handler.db);

NEW in v0.8.1: The required initLockStorage() and ensureLockStorageAccessible() methods enable explicit lock storage setup and pre-flight validation. See examples below.

Initialization Methods (v0.8.1+)

Required lifecycle methods for explicit storage setup and validation.

initLockStorage()

Creates lock storage structures (tables, collections, paths) before first use.

When to Implement:

  • Database adapters requiring table/collection creation
  • Adapters needing indexes or constraints
  • File-based or cloud storage requiring path setup

Benefits:

  • ✅ Fail-fast: Errors during setup, not during first lock
  • ✅ Explicit: Clear when storage is initialized
  • ✅ Testable: Can test setup separately from lock operations

Example (PostgreSQL):

class PostgresLockingService implements ILockingService<IPostgresDB> {
  async initLockStorage(): Promise<void> {
    await this.db.query(`
      CREATE TABLE IF NOT EXISTS migration_locks (
        id SERIAL PRIMARY KEY,
        executor_id VARCHAR(255) UNIQUE NOT NULL,
        locked_at TIMESTAMP NOT NULL DEFAULT NOW(),
        expires_at TIMESTAMP NOT NULL
      )
    `);

    // Create index for expired lock cleanup
    await this.db.query(`
      CREATE INDEX IF NOT EXISTS idx_migration_locks_expires_at
      ON migration_locks(expires_at)
    `);
  }
}

Example (MongoDB):

class MongoLockingService implements ILockingService<IMongoDBInterface> {
  async initLockStorage(): Promise<void> {
    const db = this.db.client.db();

    // Create collection if needed
    const collections = await db.listCollections({ name: 'migration_locks' }).toArray();
    if (collections.length === 0) {
      await db.createCollection('migration_locks');
    }

    // Create unique index on executor_id
    await db.collection('migration_locks').createIndex(
      { executor_id: 1 },
      { unique: true }
    );
  }
}

ensureLockStorageAccessible()

Verifies lock storage is accessible before attempting lock operations.

When to Implement:

  • Pre-deployment validation in CI/CD
  • Remote storage with network connectivity checks
  • Permission validation

Benefits:

  • ✅ Early detection of configuration issues
  • ✅ Clear error messages before migrations start
  • ✅ CI/CD validation without running migrations

Example:

class PostgresLockingService implements ILockingService<IPostgresDB> {
  async ensureLockStorageAccessible(): Promise<boolean> {
    try {
      await this.db.query('SELECT 1 FROM migration_locks LIMIT 1');
      return true;
    } catch (error) {
      // Table doesn't exist or no permissions
      return false;
    }
  }
}

Usage in Handler:

class MyHandler implements IDatabaseMigrationHandler<IDB> {
  async initialize(): Promise<void> {
    // Initialize lock storage
    if (this.lockingService) {
      await this.lockingService.initLockStorage();

      // Verify storage is accessible
      const accessible = await this.lockingService.ensureLockStorageAccessible();
      if (!accessible) {
        throw new Error('Lock storage not accessible. Check permissions and run initLockStorage().');
      }
    }
  }
}

See adapter-specific documentation for implementation examples.


ILockingHooks Interface (v0.8.1)

NEW in v0.8.1: Lifecycle hooks for lock operations enable observability, metrics collection, alerting, and audit logging.

Overview

ILockingHooks provides 9 optional hook methods that are called during lock lifecycle events. Use these hooks to:

  • Collect Metrics: Track lock acquisition times, conflicts, and retry counts
  • Send Alerts: Notify on-call teams via Slack/PagerDuty when lock conflicts occur
  • Audit Logging: Record who acquired/released locks for compliance
  • Debugging: Log detailed lock events during troubleshooting

Interface Definition

import { ILockingHooks, ILockStatus } from '@migration-script-runner/core';

interface ILockingHooks {
  // Before acquiring lock
  onBeforeAcquireLock?(executorId: string, timeout: number): Promise<void>;

  // After successfully acquiring and verifying lock
  onLockAcquired?(executorId: string, status: ILockStatus): Promise<void>;

  // When lock acquisition fails after all retries
  onLockAcquisitionFailed?(executorId: string, currentOwner: string): Promise<void>;

  // Before each retry attempt
  onAcquireRetry?(executorId: string, attempt: number, currentOwner: string): Promise<void>;

  // When ownership verification fails after acquisition
  onOwnershipVerificationFailed?(executorId: string): Promise<void>;

  // Before releasing lock
  onBeforeReleaseLock?(executorId: string): Promise<void>;

  // After successfully releasing lock
  onLockReleased?(executorId: string): Promise<void>;

  // When lock is force-released
  onForceReleaseLock?(status: ILockStatus | null): Promise<void>;

  // On any lock operation error
  onLockError?(operation: string, error: Error, executorId?: string): Promise<void>;
}

Usage Example: Metrics Collection

import { ILockingHooks, ILockStatus } from '@migration-script-runner/core';

class MetricsLockingHooks implements ILockingHooks {
  private metricsClient: MetricsClient;
  private startTime?: number;

  constructor(metricsClient: MetricsClient) {
    this.metricsClient = metricsClient;
  }

  async onBeforeAcquireLock(executorId: string, timeout: number): Promise<void> {
    this.startTime = Date.now();
    console.log(`[Metrics] Attempting to acquire lock: ${executorId}`);
  }

  async onLockAcquired(executorId: string, status: ILockStatus): Promise<void> {
    const duration = this.startTime ? Date.now() - this.startTime : 0;

    // Send metrics to DataDog/CloudWatch
    this.metricsClient.timing('migration.lock.acquired', duration);
    this.metricsClient.increment('migration.lock.success');

    console.log(`[Metrics] Lock acquired in ${duration}ms by ${executorId}`);
  }

  async onLockAcquisitionFailed(executorId: string, currentOwner: string): Promise<void> {
    // Track lock conflicts
    this.metricsClient.increment('migration.lock.conflict');
    this.metricsClient.tag('current_owner', currentOwner);

    console.error(`[Metrics] Lock conflict - held by ${currentOwner}`);
  }

  async onAcquireRetry(executorId: string, attempt: number, currentOwner: string): Promise<void> {
    this.metricsClient.increment('migration.lock.retry');
    console.log(`[Metrics] Lock retry #${attempt}, waiting for ${currentOwner}`);
  }

  async onLockError(operation: string, error: Error, executorId?: string): Promise<void> {
    this.metricsClient.increment('migration.lock.error');
    this.metricsClient.tag('operation', operation);
    console.error(`[Metrics] Lock error during ${operation}:`, error.message);
  }
}

// Usage
const hooks = new MetricsLockingHooks(myMetricsClient);
const handler = new MyHandler({
  lockingService: myLockingService,
  lockingHooks: hooks  // Pass hooks to handler
});

Usage Example: Slack Alerts

class SlackAlertHooks implements ILockingHooks {
  private slackWebhook: string;

  constructor(slackWebhook: string) {
    this.slackWebhook = slackWebhook;
  }

  async onLockAcquisitionFailed(executorId: string, currentOwner: string): Promise<void> {
    await this.sendSlackAlert({
      text: '⚠️ Migration Lock Conflict',
      attachments: [{
        color: 'warning',
        fields: [
          { title: 'Attempted By', value: executorId },
          { title: 'Currently Held By', value: currentOwner },
          { title: 'Action', value: 'Wait for lock release or force-release if stale' }
        ]
      }]
    });
  }

  async onLockError(operation: string, error: Error): Promise<void> {
    await this.sendSlackAlert({
      text: '🚨 Migration Lock Error',
      attachments: [{
        color: 'danger',
        fields: [
          { title: 'Operation', value: operation },
          { title: 'Error', value: error.message }
        ]
      }]
    });
  }

  private async sendSlackAlert(payload: any): Promise<void> {
    await fetch(this.slackWebhook, {
      method: 'POST',
      body: JSON.stringify(payload)
    });
  }
}

Usage Example: Audit Logging

class AuditLogHooks implements ILockingHooks {
  private auditLogger: Logger;

  constructor(auditLogger: Logger) {
    this.auditLogger = auditLogger;
  }

  async onLockAcquired(executorId: string, status: ILockStatus): Promise<void> {
    this.auditLogger.info({
      event: 'lock_acquired',
      executorId,
      timestamp: status.lockedAt,
      expiresAt: status.expiresAt
    });
  }

  async onLockReleased(executorId: string): Promise<void> {
    this.auditLogger.info({
      event: 'lock_released',
      executorId,
      timestamp: new Date()
    });
  }

  async onForceReleaseLock(status: ILockStatus | null): Promise<void> {
    this.auditLogger.warn({
      event: 'lock_force_released',
      previousOwner: status?.lockedBy,
      timestamp: new Date(),
      reason: 'manual_intervention'
    });
  }
}

Integration with Handler

Hooks are automatically passed to LockingOrchestrator when provided to the handler:

import { IDatabaseMigrationHandler, ILockingHooks } from '@migration-script-runner/core';

class MyHandler implements IDatabaseMigrationHandler<IDB> {
  lockingService?: ILockingService<IDB>;
  lockingHooks?: ILockingHooks;  // Add hooks property

  constructor(options: {
    lockingService?: ILockingService<IDB>;
    lockingHooks?: ILockingHooks;  // Accept hooks in constructor
  }) {
    this.lockingService = options.lockingService;
    this.lockingHooks = options.lockingHooks;
  }
}

// Create handler with hooks
const handler = new MyHandler({
  lockingService: new MyLockingService(db),
  lockingHooks: new MetricsLockingHooks(metricsClient)
});

Hook Failures: Hook errors are logged but do not fail the migration. If a hook throws, the error is logged and execution continues.


LockingOrchestrator Service (v0.8.1)

NEW in v0.8.1: Internal service that orchestrates lock operations with retry logic, hooks, and two-phase locking.

Overview

LockingOrchestrator is an internal decorator service that wraps your ILockingService implementation. It handles:

  • Retry Logic: Automatically retries lock acquisition with configurable attempts and delays
  • Two-Phase Locking: Acquires lock then verifies ownership to prevent race conditions
  • Hook Invocation: Calls lifecycle hooks at appropriate points
  • Error Handling: Consistent error handling with context preservation
  • Logging: Structured logging of all lock operations

Architecture

MSR follows a decorator pattern for lock orchestration:

Your Handler
├── lockingService: ILockingService (adapter implementation)
│   └── Database-specific lock operations
└── lockingHooks?: ILockingHooks (optional hooks)

MigrationWorkflowOrchestrator (internal)
└── lockingOrchestrator: LockingOrchestrator
    ├── Wraps your lockingService
    ├── Applies retry logic
    ├── Invokes hooks
    └── Two-phase locking verification

Key Design Principles:

  1. Adapters Stay Simple: Your ILockingService only implements database operations
  2. Core Handles Orchestration: MSR core manages retry, hooks, and verification
  3. No Duplication: Retry logic is centralized, not repeated in every adapter
  4. Consistent Behavior: All databases get the same orchestration

What It Does

1. Retry Logic

// Your code
const acquired = await executor.up();

// What LockingOrchestrator does internally:
for (let attempt = 1; attempt <= config.locking.retryAttempts; attempt++) {
  await hooks?.onBeforeAcquireLock?.(executorId, timeout);

  const acquired = await lockingService.acquireLock(executorId);
  if (acquired) {
    // Success - verify ownership
    const verified = await lockingService.verifyLockOwnership(executorId);
    if (verified) {
      await hooks?.onLockAcquired?.(executorId, status);
      return true;
    }
  }

  // Retry
  await hooks?.onAcquireRetry?.(executorId, attempt, currentOwner);
  await sleep(config.locking.retryDelay);
}

// Failed after all retries
await hooks?.onLockAcquisitionFailed?.(executorId, currentOwner);
return false;

2. Two-Phase Locking

Prevents race conditions by verifying ownership after acquisition:

// Phase 1: Acquire lock
const acquired = await lockingService.acquireLock(executorId);

// Phase 2: Verify ownership (catches race conditions)
const verified = await lockingService.verifyLockOwnership(executorId);

if (!verified) {
  await hooks?.onOwnershipVerificationFailed?.(executorId);
  throw new Error('Lock ownership verification failed');
}

This detects cases where:

  • Two processes acquire lock simultaneously due to database timing
  • Clock skew causes lock to expire immediately
  • Another process force-released the lock

3. Hook Invocation

Hooks are called at specific lifecycle points:

// Before any operation
await hooks?.onBeforeAcquireLock?.(executorId, timeout);

// On success
await hooks?.onLockAcquired?.(executorId, status);

// On failure
await hooks?.onLockAcquisitionFailed?.(executorId, currentOwner);

// On retry
await hooks?.onAcquireRetry?.(executorId, attempt, currentOwner);

// On error
await hooks?.onLockError?.('acquire', error, executorId);

Automatic Integration

You don’t instantiate LockingOrchestrator directly. It’s created automatically by MigrationWorkflowOrchestrator when your handler provides a lockingService:

// In your handler
class MyHandler implements IDatabaseMigrationHandler<IDB> {
  lockingService = new MyLockingService(db);  // You provide this
  lockingHooks = new MetricsHooks();          // Optional
}

// MSR creates orchestrator internally
// MigrationWorkflowOrchestrator.constructor():
if (handler.lockingService) {
  this.lockingOrchestrator = new LockingOrchestrator(
    handler.lockingService,  // Your adapter
    config.locking,          // Retry config
    logger,                  // Logger
    handler.lockingHooks     // Optional hooks
  );
}

Configuration

Configure retry behavior via config.locking:

import { Config } from '@migration-script-runner/core';

const config = new Config();
config.locking.retryAttempts = 5;    // How many times to retry
config.locking.retryDelay = 1000;    // Wait 1000ms between retries
config.locking.timeout = 60000;      // Lock expires after 60 seconds

Benefits for Adapters

Before v0.8.1 (without LockingOrchestrator):

// Every adapter had to implement retry logic
class MyLockingService implements ILockingService<IDB> {
  async acquireLock(executorId: string): Promise<boolean> {
    // Adapter must implement retry logic ❌
    // Adapter must implement hook calls ❌
    // Adapter must implement two-phase locking ❌
    // Code duplication across adapters ❌
  }
}

After v0.8.1 (with LockingOrchestrator):

// Adapters only implement database operations
class MyLockingService implements ILockingService<IDB> {
  async acquireLock(executorId: string): Promise<boolean> {
    // Pure database operation ✅
    const result = await this.db.transaction(/* ... */);
    return result.committed;
  }

  // No retry logic needed ✅
  // No hook calls needed ✅
  // Core handles orchestration ✅
}

Error Handling

All errors are caught, logged, and hooks are invoked:

try {
  await lockingService.acquireLock(executorId);
} catch (error) {
  logger.error(`Lock acquisition error: ${error}`);
  await hooks?.onLockError?.('acquire', error, executorId);
  throw error;  // Re-throw to fail migration
}

When Hooks Run: Hooks run during lock operations but are optional. If a hook throws an error, it’s logged but doesn’t fail the migration.


Troubleshooting

Lock Stuck After Crash

If a process crashes, the lock expires automatically after timeout milliseconds. Or manually release:

# Check who holds the lock
msr lock:status

# Force release if you're sure it's stale
msr lock:release --force

Lock Acquired But Ownership Verification Failed

This indicates a race condition or clock skew between servers. The lock was acquired but immediately lost. Possible causes:

  • System clock differences between servers
  • Database connection issues
  • Extremely high concurrency

Multiple MSR Instances Conflicting

Use different tableName for each instance:

// App 1
config.locking.tableName = 'app1_migration_locks';

// App 2
config.locking.tableName = 'app2_migration_locks';