Matthew Hodge
Senior Developer

Laravel Queues in Production: Failed Jobs, Retries, and Monitoring with Horizon

Queues are one of those Laravel features that feel like magic in development and then quietly fall over the first time you put them in front of real traffic. Locally you're probably running the sync driver, so your "queued" jobs run instantly and you never think about them. Then you deploy, switch to a real driver, and discover the harder questions: who's running the worker? What happens when a job throws an exception? How do you even know a job failed?

In this post I'll walk through running Laravel queues properly in production — choosing a driver, keeping workers alive under Supervisor, configuring retries and timeouts so failures are handled gracefully, working the failed_jobs table, and finally putting Horizon on top for real visibility.

If you've used queued event listeners before (implements ShouldQueue), you've already been writing jobs — listeners are just jobs in disguise. This post is about everything that happens after you dispatch one.


A Quick Recap: What a Job Looks Like

A queued job is a class that implements ShouldQueue. Generate one with:

php artisan make:job ProcessOrderShipping

In modern Laravel (11+), the stub is lean — a single Queueable trait pulls in everything:

// app/Jobs/ProcessOrderShipping.php

namespace App\Jobs;

use App\Models\Order;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Queue\Queueable;

class ProcessOrderShipping implements ShouldQueue
{
    use Queueable;

    public function __construct(public Order $order) {}

    public function handle(): void
    {
        // Talk to the shipping API, write a label, notify the customer...
    }
}

And you dispatch it from a controller, action, or listener:

ProcessOrderShipping::dispatch($order);

// Or onto a specific queue, with a delay:
ProcessOrderShipping::dispatch($order)
    ->onQueue('shipping')
    ->delay(now()->addMinutes(2));

That's the easy part. The rest of this post is about making sure that job actually runs, and runs reliably.


Choosing a Queue Driver

The driver decides where dispatched jobs are stored until a worker picks them up. You set it in .env via QUEUE_CONNECTION, with the details in config/queue.php.

DriverGood forNotes
syncLocal dev, testsRuns immediately, no queue at all. Never use in production.
databaseSmall apps, no extra infraUses your existing DB. Fine at low volume; adds load as it grows.
redisMost production appsFast, battle-tested, and the only driver Horizon supports.
sqsServerless / managed setupsNo worker servers to babysit, but no Horizon either.
beanstalkdLegacy / specific setupsWorks, but Redis is the more common choice these days.

My default recommendation: Redis. It's fast, you almost certainly already have it around for caching and sessions, and it unlocks Horizon — which is the whole reason this post exists.

If you want to start on database to avoid standing up Redis, that's a perfectly reasonable on-ramp. Laravel 11+ already ships the jobs, failed_jobs, and job_batches migrations in the default skeleton, so you just run:

php artisan migrate

On older versions, generate them first with php artisan queue:table and php artisan queue:failed-table, then migrate.


Running Workers in Production

Here's the thing that trips everyone up the first time: dispatching a job does nothing on its own. Something has to pull jobs off the queue and run them. That something is a worker process:

php artisan queue:work

You'll see queue:listen mentioned in older tutorials. The difference matters:

  • queue:work boots the framework once and stays in memory — fast, and what you want in production.
  • queue:listen reboots the framework on every job — slower, but it picks up code changes without a restart. Handy in development, wasteful in production.

That speed comes with a catch, and it's the single most common queue bug I see: because queue:work holds your code in memory, deploying new code does nothing until you restart the worker. Your shiny bug fix sits in the repo while the old code keeps running. After every deploy, tell workers to finish their current job and gracefully exit:

php artisan queue:restart

Keeping Workers Alive with Supervisor

A worker is just a long-running process, and processes die — out of memory, an unhandled error, a server reboot. In production you never run queue:work by hand; you let a process manager keep it alive. On Linux that's almost always Supervisor (the same tool you'd reach for to keep any long-running process running — if you've containerised a PHP app with Supervisord before, this will feel familiar).

; /etc/supervisor/conf.d/laravel-worker.conf

[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/app/artisan queue:work redis --queue=high,default --sleep=3 --tries=3 --max-time=3600
autostart=true
autorestart=true
stopwaitsecs=3600
user=www-data
numprocs=4
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/worker.log

A few of those flags are doing real work:

  • numprocs=4 runs four workers in parallel. Scale this to your workload and CPU.
  • --queue=high,default processes the high queue before default, so urgent jobs jump the line. Dispatch to it with ->onQueue('high').
  • --max-time=3600 recycles each worker after an hour. Long-lived PHP processes leak memory eventually; restarting on a schedule keeps them honest. --max-jobs=1000 does the same thing by job count.
  • stopwaitsecs=3600 is easy to overlook. When Supervisor restarts a worker it sends a stop signal, then waits this long before force-killing. Set it at least as high as your longest job's timeout, or Supervisor will kill jobs mid-flight during a deploy.

Then reload Supervisor to pick up the config:

sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl start laravel-worker:*

Handling Failures: Tries, Backoff, and Timeouts

Jobs fail. An API times out, a third party returns a 500, a record gets deleted out from under you. The goal isn't to prevent every failure — it's to fail gracefully and retry sensibly.

By default, if you don't say otherwise, a job is attempted once and then moved to the failed jobs table. That's rarely what you want for anything touching the network. Configure retries right on the job:

// app/Jobs/ProcessOrderShipping.php

class ProcessOrderShipping implements ShouldQueue
{
    use Queueable;

    // Attempt up to 3 times before giving up.
    public int $tries = 3;

    // Wait 10s, then 30s, then 60s between attempts (exponential backoff).
    public array $backoff = [10, 30, 60];

    // Kill the job if a single attempt runs longer than 120 seconds.
    public int $timeout = 120;

    // Stop retrying after 5 unhandled exceptions, even if $tries is higher.
    public int $maxExceptions = 5;

    public function handle(): void
    {
        // ...
    }
}

Backoff matters more than people think. If a downstream API is having a bad moment, hammering it with instant retries makes things worse for everyone. Spacing retries out gives it room to recover.

For time-based retries instead of a count, use retryUntil():

public function retryUntil(): \DateTime
{
    return now()->addMinutes(10);
}

The failed() Hook

When a job exhausts its retries, Laravel calls its failed() method if you've defined one. This is where you do cleanup or raise an alert — mark the order as stuck, notify the team, whatever the situation needs:

use Throwable;

public function failed(Throwable $exception): void
{
    $this->order->update(['shipping_status' => 'failed']);

    Log::error('Shipping job failed for good', [
        'order_id' => $this->order->id,
        'error'    => $exception->getMessage(),
    ]);
}

Make Your Jobs Idempotent

This is the one I'd underline. Retries only help if running a job twice is safe. If your job charges a card or sends an email, a retry after a partial failure can double-charge or double-send. Design jobs so that running them again is harmless — check whether the work is already done before doing it, key external calls on a unique reference, and lean on Laravel's ShouldBeUnique or the WithoutOverlapping middleware where it fits. Idempotency is what makes the whole retry story trustworthy.


Working the Failed Jobs Table

When a job gives up, it lands in failed_jobs with its payload and the exception. A handful of Artisan commands let you work that table:

# See what's failed
php artisan queue:failed

# Retry a specific job by its UUID
php artisan queue:retry 5a3c...e91

# Retry everything that failed
php artisan queue:retry all

# Delete one failed job
php artisan queue:forget 5a3c...e91

# Clear the whole table
php artisan queue:flush

In practice, the useful habit is to check queue:failed as part of your regular health checks rather than waiting for a customer to tell you their order never shipped. Better still — get something to tell you automatically, which is where monitoring comes in.


Monitoring with Horizon

Everything above works, but you're flying blind. You can't see how many jobs are waiting, how long they're taking, or whether your throughput is keeping up with what's being dispatched. Laravel Horizon is the answer — a beautiful dashboard and a smarter worker manager for Redis queues.

Install it:

composer require laravel/horizon
php artisan horizon:install
php artisan migrate

Then, instead of running queue:work yourself, you run Horizon — it manages the worker processes for you based on config/horizon.php:

php artisan horizon

Visit /horizon and you get real-time throughput, runtime and wait-time metrics per queue, a list of recent and failed jobs (with full stack traces), and the ability to retry failed jobs from the browser. The first time you see it, you'll wonder how you managed without it.

Configuring Workers in Horizon

Because Horizon manages workers, your Supervisor config gets simpler — you supervise one process (horizon) instead of a pool of queue:work commands. The worker pool itself is defined in config:

// config/horizon.php

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection'   => 'redis',
            'queue'        => ['high', 'default'],
            'balance'      => 'auto',
            'minProcesses' => 1,
            'maxProcesses' => 10,
            'tries'        => 3,
            'timeout'      => 120,
            'memory'       => 128,
        ],
    ],
],

The balance strategy is Horizon's party trick. With auto, it shifts worker processes between queues based on load — if the high queue suddenly backs up, it moves workers onto it and pulls them back when things calm down. You set the ceiling with maxProcesses and let Horizon allocate within it.

Your Supervisor config then just keeps Horizon alive:

; /etc/supervisor/conf.d/horizon.conf

[program:horizon]
process_name=%(program_name)s
command=php /var/www/app/artisan horizon
autostart=true
autorestart=true
stopwaitsecs=3600
user=www-data
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/horizon.log

And the deploy step changes from queue:restart to:

php artisan horizon:terminate

Same idea — finish current jobs, then exit so Supervisor restarts Horizon with the new code.

Securing the Dashboard

By default the Horizon dashboard is only viewable in local. In production you have to explicitly say who's allowed in, via the viewHorizon gate:

// app/Providers/HorizonServiceProvider.php

protected function gate(): void
{
    Gate::define('viewHorizon', function ($user) {
        return in_array($user->email, [
            'you@example.com',
        ]);
    });
}

Forget this and you've published your job internals — payloads and all — to the public internet. Don't skip it.


Monitoring Without Horizon

Horizon is Redis-only. If you're on SQS or database, you've still got options.

queue:monitor ships with Laravel and fires a QueueBusy event when a queue exceeds a size threshold — hook a notification onto it:

php artisan queue:monitor redis:default,redis:high --max=100

Run it on a schedule and you'll get told when work is piling up.

Laravel Telescope gives you a local/staging dashboard for jobs (among everything else) — great for debugging, though heavier than you'd want recording everything in production.

At minimum, alert on the failed-jobs count. A scheduled command that checks failed_jobs and pings Slack when it's non-empty is ten minutes of work and will save you a bad day.


A Few Production Habits

To wrap up, the things worth doing from day one rather than retrofitting after an incident:

  • Keep payloads small. Pass an ID, not a whole Eloquent model graph — the model gets re-fetched anyway (SerializesModels), and small payloads are faster and less brittle.
  • Separate queues by priority. A slow report shouldn't block a password-reset email. Split them and process the urgent one first.
  • Always set $tries, $timeout, and $backoff explicitly. The defaults are rarely what you want for anything real.
  • Make jobs idempotent. Retries are only safe if a second run can't do damage.
  • Restart workers on deployqueue:restart or horizon:terminate. The number of "my fix isn't live" mysteries this solves is remarkable.
  • Watch the failed count. Whether it's Horizon, queue:monitor, or a cron job, something should tell you when jobs fail — not your users.

Final Thoughts

Queues go from "magic" to "infrastructure" the moment you ship them. The good news is that Laravel gives you everything you need to run them properly: sensible retry handling on the job, Supervisor to keep workers alive, the failed_jobs table as a safety net, and Horizon to turn the whole thing from a black box into a dashboard you actually trust.

If you take one thing away, make it this: assume jobs will fail, design them so retries are safe, and put something in place that tells you when they do. Get those three right and queues become one of the most dependable parts of your stack rather than the scariest. Start with retries and a Supervisor config, add Horizon once you're on Redis, and you'll wonder why you ever ran jobs without visibility.