Scaling Medical Coding AI: Lessons from Building CasePilot

by Niels, Co-Founder / CEO

Building CasePilot with Laravel taught us that healthcare AI isn't just about algorithms—it's about trust, compliance, and the kind of reliability that medical professionals stake their careers on. Here's how we built a Laravel-based system that processes 10,000+ medical codes daily with 99% accuracy.

The Accuracy Challenge

In medical coding, 99% accuracy isn't just a nice-to-have—it's table stakes. A single misclassified code can result in claim denials, audit flags, or compliance violations. Our first prototype hit 94% accuracy, which sounds impressive until you realize that means 600 errors per 10,000 codes.

Getting from 94% to 99% required rethinking our entire approach:

Multi-Layer Validation

Instead of relying on a single model, we built a Laravel-based validation pipeline using jobs and queues:

// Laravel Job for processing medical coding
class ProcessMedicalCoding implements ShouldQueue
{
    public function handle()
    {
        // Primary Classification using our ML service
        $classification = app(MedicalCodingService::class)->classify($this->medicalRecord);
        
        // Confidence scoring
        $confidence = $classification->confidenceScore;
        
        // Rule-based validation using Laravel's validation
        $validator = Validator::make($classification->toArray(), [
            'primary_code' => ['required', new ValidMedicalCodeRule()],
            'secondary_codes' => ['array', new ValidSecondaryCodesRule()],
        ]);
        
        if ($confidence < 0.85 || $validator->fails()) {
            // Queue for human review using Laravel jobs
            ProcessHumanReview::dispatch($this->medicalRecord, $classification);
        } else {
            // Auto-approve high-confidence results
            MedicalCode::create([
                'record_id' => $this->medicalRecord->id,
                'codes' => $classification->codes,
                'confidence' => $confidence,
                'status' => 'approved'
            ]);
        }
    }
}
  • Primary Classification: Our main neural network handles initial code assignment
  • Confidence Scoring: Each prediction includes a confidence interval stored in MySQL
  • Rule-Based Validation: Laravel's validation system catches common edge cases
  • Human Review Queue: Low-confidence predictions get dispatched to Laravel jobs for review

HIPAA Compliance at Scale

Healthcare data compliance isn't optional. Every architectural decision had to consider privacy, security, and auditability:

Data Minimization with Laravel

We process only the minimum viable data needed for coding using Laravel's built-in features:

// Laravel middleware to sanitize medical data
class SanitizeMedicalDataMiddleware
{
    public function handle($request, Closure $next)
    {
        $medicalText = $request->input('medical_text');
        
        // Remove patient identifiers using Laravel's Str helper
        $sanitized = Str::of($medicalText)
            ->replaceMatches('/\b\d{3}-\d{2}-\d{4}\b/', '[SSN_REDACTED]') // SSN
            ->replaceMatches('/\b\d{10,11}\b/', '[PHONE_REDACTED]') // Phone
            ->replaceMatches('/\b[\w\.-]+@[\w\.-]+\.\w+\b/', '[EMAIL_REDACTED]'); // Email
        
        $request->merge(['medical_text' => $sanitized]);
        return $next($request);
    }
}
  • Text extraction removes patient identifiers using Laravel middleware before analysis
  • Temporary processing using Laravel's cache with TTL means no data persists longer than necessary
  • Zero-knowledge architecture with Laravel's encrypted casting ensures we never store sensitive data in plain text

Top tip

Design your AI pipeline to be stateless wherever possible. The less data you hold, the smaller your compliance surface area.

Audit Trails with Laravel

Every coding decision creates an immutable audit log using Laravel's database features:

// Laravel model with automatic audit logging
class MedicalCodeAudit extends Model
{
    protected $fillable = [
        'record_fingerprint',
        'model_version',
        'confidence_score',
        'human_override',
        'reasoning',
        'processed_by'
    ];

    protected $casts = [
        'processed_at' => 'datetime',
        'confidence_score' => 'decimal:4'
    ];
    
    // Automatically log when a coding decision is made
    public static function logCodingDecision($record, $result, $user = null)
    {
        static::create([
            'record_fingerprint' => hash('sha256', $record->getKey()),
            'model_version' => config('medical.model_version'),
            'confidence_score' => $result->confidence,
            'human_override' => $user ? true : false,
            'reasoning' => $result->reasoning,
            'processed_by' => $user?->id ?? 'system'
        ]);
    }
}
  • Input data fingerprints (not the data itself) stored using Laravel's hashing
  • Model versions and confidence scores tracked in the database
  • Human overrides and reasoning logged through Laravel's Eloquent ORM
  • Processing timestamps automatically managed by Laravel's carbon dates

Real-Time Performance Requirements

Medical coding happens in real-time during patient encounters. Our SLA is 2 seconds from input to coded result, which created interesting technical challenges:

Model Optimization

  • Quantization: Reduced model size by 60% with minimal accuracy loss
  • Caching: Common diagnosis patterns cached at multiple levels
  • Batch Processing: Group similar requests for GPU efficiency

Infrastructure Scaling

Load Balancer
├── API Gateway (Rate limiting, auth)
├── Processing Cluster (Auto-scaling GPUs)
├── Cache Layer (Redis for common codes)
└── Database (Audit logs, user sessions)

We use spot instances for non-critical processing and reserved instances for the real-time pipeline. This hybrid approach cuts costs by 40% while maintaining performance guarantees.

The Learning Loop

Medical coding rules evolve constantly. New ICD-10 codes, changing regulations, and emerging medical procedures mean our system must continuously learn without losing reliability:

Continuous Training Pipeline

  • Weekly model updates with new coding examples
  • A/B testing for model versions before deployment
  • Rollback mechanisms if accuracy degrades
  • Shadow mode testing for experimental features

Domain Expert Integration

Our breakthrough came from treating medical coders as partners, not users to replace:

  • Weekly feedback sessions shaped our feature roadmap
  • Coder corrections become training data for model improvement
  • Expert validation for edge cases our models struggle with

Cost Optimization at Scale

Running AI at healthcare scale is expensive. Here's how we optimized:

Smart Resource Management

  • GPU Sharing: Multiple lightweight models per GPU
  • Preprocessing Optimization: CPU-based text cleaning before GPU inference
  • Dynamic Scaling: Auto-scale based on demand patterns
  • Regional Distribution: Process requests in the nearest data center

Model Efficiency

Instead of one massive model, we use specialized smaller models:

  • Specialty-specific models (cardiology, orthopedics, etc.)
  • Confidence-based routing to appropriate model size
  • Ensemble approaches only for high-stakes predictions

Lessons for Healthcare AI

Three key insights from building CasePilot:

  1. Accuracy isn't just about the model—it's about the entire system design
  2. Compliance should influence architecture from day one, not be bolted on later
  3. Healthcare professionals are your best product partners, not obstacles to automation

The result? CasePilot now processes over 10,000 codes daily, maintains 99%+ accuracy, and has become an essential tool for healthcare providers nationwide. But more importantly, it's taught us how to build AI systems that healthcare professionals actually trust.


Building healthcare AI? We've learned the hard way what works and what doesn't. Let's talk about your specific challenges.

More articles

Scaling Property Management: Building a Comprehensive Platform for 100+ Properties

How we built a comprehensive property management platform that automates operations for 100+ properties, featuring QR code access control, contractor management, and automated workflows that work for any property portfolio.

Read more

API Design Patterns for Growing SaaS Platforms: Lessons from 15+ Products

Battle-tested API patterns that scale from prototype to enterprise. How we design APIs across our SaaS portfolio for growth, integration, and developer experience.

Read more

Ready to Build Something Great?

Whether you need custom software development or are considering an exit, let's discuss how Devbright can help accelerate your success.