Scaling Medical Coding AI: Lessons from Building CasePilot

February 10, 2025

by Niels, Co-Founder / CEO

Building CasePilot with Laravel taught us that healthcare AI isn't just about algorithms—it's about trust, compliance, and the kind of reliability that medical professionals stake their careers on. Here's how we built a Laravel-based system that processes 10,000+ medical codes daily with 99% accuracy.

The Accuracy Challenge

In medical coding, 99% accuracy isn't just a nice-to-have—it's table stakes. A single misclassified code can result in claim denials, audit flags, or compliance violations. Our first prototype hit 94% accuracy, which sounds impressive until you realize that means 600 errors per 10,000 codes.

Getting from 94% to 99% required rethinking our entire approach:

Multi-Layer Validation

Instead of relying on a single model, we built a Laravel-based validation pipeline using jobs and queues:

// Laravel Job for processing medical coding
class ProcessMedicalCoding implements ShouldQueue
{
    public function handle()
    {
        // Primary Classification using our ML service
        $classification = app(MedicalCodingService::class)->classify($this->medicalRecord);
        
        // Confidence scoring
        $confidence = $classification->confidenceScore;
        
        // Rule-based validation using Laravel's validation
        $validator = Validator::make($classification->toArray(), [
            'primary_code' => ['required', new ValidMedicalCodeRule()],
            'secondary_codes' => ['array', new ValidSecondaryCodesRule()],
        ]);
        
        if ($confidence < 0.85 || $validator->fails()) {
            // Queue for human review using Laravel jobs
            ProcessHumanReview::dispatch($this->medicalRecord, $classification);
        } else {
            // Auto-approve high-confidence results
            MedicalCode::create([
                'record_id' => $this->medicalRecord->id,
                'codes' => $classification->codes,
                'confidence' => $confidence,
                'status' => 'approved'
            ]);
        }
    }
}

Primary Classification: Our main neural network handles initial code assignment
Confidence Scoring: Each prediction includes a confidence interval stored in MySQL
Rule-Based Validation: Laravel's validation system catches common edge cases
Human Review Queue: Low-confidence predictions get dispatched to Laravel jobs for review

HIPAA Compliance at Scale

Healthcare data compliance isn't optional. Every architectural decision had to consider privacy, security, and auditability:

Data Minimization with Laravel

We process only the minimum viable data needed for coding using Laravel's built-in features:

// Laravel middleware to sanitize medical data
class SanitizeMedicalDataMiddleware
{
    public function handle($request, Closure $next)
    {
        $medicalText = $request->input('medical_text');
        
        // Remove patient identifiers using Laravel's Str helper
        $sanitized = Str::of($medicalText)
            ->replaceMatches('/\b\d{3}-\d{2}-\d{4}\b/', '[SSN_REDACTED]') // SSN
            ->replaceMatches('/\b\d{10,11}\b/', '[PHONE_REDACTED]') // Phone
            ->replaceMatches('/\b[\w\.-]+@[\w\.-]+\.\w+\b/', '[EMAIL_REDACTED]'); // Email
        
        $request->merge(['medical_text' => $sanitized]);
        return $next($request);
    }
}

Text extraction removes patient identifiers using Laravel middleware before analysis
Temporary processing using Laravel's cache with TTL means no data persists longer than necessary
Zero-knowledge architecture with Laravel's encrypted casting ensures we never store sensitive data in plain text

Top tip

Design your AI pipeline to be stateless wherever possible. The less data you hold, the smaller your compliance surface area.

Audit Trails with Laravel

Every coding decision creates an immutable audit log using Laravel's database features:

// Laravel model with automatic audit logging
class MedicalCodeAudit extends Model
{
    protected $fillable = [
        'record_fingerprint',
        'model_version',
        'confidence_score',
        'human_override',
        'reasoning',
        'processed_by'
    ];

    protected $casts = [
        'processed_at' => 'datetime',
        'confidence_score' => 'decimal:4'
    ];
    
    // Automatically log when a coding decision is made
    public static function logCodingDecision($record, $result, $user = null)
    {
        static::create([
            'record_fingerprint' => hash('sha256', $record->getKey()),
            'model_version' => config('medical.model_version'),
            'confidence_score' => $result->confidence,
            'human_override' => $user ? true : false,
            'reasoning' => $result->reasoning,
            'processed_by' => $user?->id ?? 'system'
        ]);
    }
}

Input data fingerprints (not the data itself) stored using Laravel's hashing
Model versions and confidence scores tracked in the database
Human overrides and reasoning logged through Laravel's Eloquent ORM
Processing timestamps automatically managed by Laravel's carbon dates

Real-Time Performance Requirements

Medical coding happens in real-time during patient encounters. Our SLA is 2 seconds from input to coded result, which created interesting technical challenges:

Model Optimization

Quantization: Reduced model size by 60% with minimal accuracy loss
Caching: Common diagnosis patterns cached at multiple levels
Batch Processing: Group similar requests for GPU efficiency

Infrastructure Scaling

Load Balancer
├── API Gateway (Rate limiting, auth)
├── Processing Cluster (Auto-scaling GPUs)
├── Cache Layer (Redis for common codes)
└── Database (Audit logs, user sessions)

We use spot instances for non-critical processing and reserved instances for the real-time pipeline. This hybrid approach cuts costs by 40% while maintaining performance guarantees.

The Learning Loop

Medical coding rules evolve constantly. New ICD-10 codes, changing regulations, and emerging medical procedures mean our system must continuously learn without losing reliability:

Continuous Training Pipeline

Weekly model updates with new coding examples
A/B testing for model versions before deployment
Rollback mechanisms if accuracy degrades
Shadow mode testing for experimental features

Domain Expert Integration

Our breakthrough came from treating medical coders as partners, not users to replace:

Weekly feedback sessions shaped our feature roadmap
Coder corrections become training data for model improvement
Expert validation for edge cases our models struggle with

Cost Optimization at Scale

Running AI at healthcare scale is expensive. Here's how we optimized:

Smart Resource Management

GPU Sharing: Multiple lightweight models per GPU
Preprocessing Optimization: CPU-based text cleaning before GPU inference
Dynamic Scaling: Auto-scale based on demand patterns
Regional Distribution: Process requests in the nearest data center

Model Efficiency

Instead of one massive model, we use specialized smaller models:

Specialty-specific models (cardiology, orthopedics, etc.)
Confidence-based routing to appropriate model size
Ensemble approaches only for high-stakes predictions

Lessons for Healthcare AI

Three key insights from building CasePilot:

Accuracy isn't just about the model—it's about the entire system design
Compliance should influence architecture from day one, not be bolted on later
Healthcare professionals are your best product partners, not obstacles to automation

The result? CasePilot now processes over 10,000 codes daily, maintains 99%+ accuracy, and has become an essential tool for healthcare providers nationwide. But more importantly, it's taught us how to build AI systems that healthcare professionals actually trust.

Building healthcare AI? We've learned the hard way what works and what doesn't. Let's talk about your specific challenges.