Case Study — Test Automation

Comprehensive Test Automation for a Legacy Bioinformatics Platform

One developer and Claude Code AI built a production data-driven test suite covering 80–90% of mission-critical workflows in two sprints—unlocking a legacy modernization that manual testing had blocked.

Published February 5, 2025 | 8 min read

160 hrs
Total engagement
80–90%
Workflow coverage
~10,000
Lines of test code
55
Custom test commands
The Challenge

A veterinary bioinformatics platform serving commercial customers needed to modernize from AngularJS and .NET Framework 4.7.2 to modern frameworks. The application had zero automated tests, and its complex bioinformatics workflows—sequence translation, homology analysis, vaccine assessment—made manual regression testing impractical and error-prone. Modernization was blocked until the team could guarantee feature parity.

The Solution

Using Claude Code as an AI pair programmer, one Capstone developer built a production data-driven test automation suite in two sprints. Tests were driven by anonymized production data extracted via custom Python scripts, ensuring coverage of real customer scenarios rather than fabricated test cases. Just six hours of customer meetings provided the domain knowledge needed to scope the work.

The Results

The engagement delivered 80–90% automated coverage of mission-critical user workflows, a library of 55 reusable test commands, database automation, application instrumentation, and 1,300+ lines of documentation. The development team was trained and fully onboarded at sprint conclusion. The modernization project can now proceed with confidence.

Business Impact

Immediate Value

The modernization project—previously blocked by the risk of undetected regressions—can now proceed with confidence. The automated regression suite validates that existing customer workflows remain intact after each change, providing a safety net that transforms the modernization from a high-risk leap to a managed, incremental process. Test execution produces detailed logs that make troubleshooting efficient when issues do arise.

Ongoing Value

The development team was trained and equipped to extend test coverage as the platform evolves. The test suite doubles as living documentation of the application's complex business workflows—documentation that previously existed only as tribal knowledge. The 55 reusable commands and established patterns make adding new test scenarios significantly faster than building from scratch.

Cost Avoidance

Manual testing of comparable scope would require days of effort from subject matter experts for each release cycle during modernization. The automated suite runs overnight, enabling daily validation without consuming expert time. Early defect detection during modernization prevents the most expensive kind of failure: customer-impacting regressions discovered in production.

Can one developer really build meaningful test automation for a complex legacy application?

With AI pair programming, yes. The AI handles the high-volume work—codebase exploration, script generation, debugging framework-specific timing issues—while the developer focuses on domain understanding and architectural decisions. The key enabler is focused domain knowledge transfer: in this engagement, six hours of customer meetings provided the workflow knowledge needed to scope and deliver the entire suite.

Technical Context

The Application

SequenceCloudService is a bioinformatics platform that analyzes biological pathogen sequences for veterinary and agricultural customers. The application provides sequence translation and homology analysis (genetic similarity scoring), vaccine effectiveness assessment against pathogen variants, customer-facing analysis reports and recommendations, and enterprise-level data management spanning farms, veterinarians, and lab results.

The platform's calculations are complex and domain-specific. A single homology analysis produces a matrix of similarity scores across multiple sequences, with color-coded results and precision-dependent outputs. Getting any of these calculations wrong has direct business consequences for the customers making vaccination decisions based on the results.

Technology Stack

AngularJS 1.x ASP.NET Framework 4.7.2 Entity Framework 6.0 SQL Server

Test tooling:

Cypress Python Claude Code

The Modernization Imperative

The legacy stack—AngularJS 1.x, ASP.NET Framework 4.7.2, Entity Framework 6.0—had reached end-of-support status. Maintaining the current stack carried increasing risk, and the aging frontend made it difficult to attract new customers and new developers. The business needed to modernize, but the bioinformatics workflows were too complex and too important to change without a safety net.

Manual regression testing was not a viable safety net. A single pass through the critical workflows required a subject matter expert spending days clicking through UI screens, entering sequence data (5–10 minutes per sequence for translation alone), and visually inspecting matrix outputs. Doing this for every release during an active modernization was impractical. Automated tests became a business prerequisite before the first line of modernization code could be written.

Is test automation always necessary before modernizing a legacy application?

For applications with complex business logic and active customers, automated tests are a practical prerequisite. Without them, every change carries the risk of silently breaking workflows customers depend on. The frequency of validation needed during active modernization—ideally daily—makes manual testing impractical regardless of team size.

The Approach

The testing strategy was built on two foundational decisions: use anonymized production data as both input and validation oracle, and mirror actual user journeys rather than testing isolated components.

Production Data as Test Oracle

Rather than fabricating synthetic test data, the developer used Claude Code to build Python scripts that extracted real enterprise, sequence, and prescription data from the production database. This data served double duty: driving realistic test scenarios that matched actual customer workflows, and providing the expected values for validation. When a test enters a sequence and checks that 18+ fields persist correctly across UI, database, and search, the expected values come from what real customers actually entered—not from guesses about what the data should look like.

Key Insight

Anonymized production data is the most time-efficient path to a large, realistic test dataset. Purpose-built synthetic data generation can produce even more thorough coverage—including denser edge-case scenarios—but developing that generation capability is a significant investment in itself. For this engagement, extracting and anonymizing real data delivered a comprehensive test oracle at a fraction of the cost, validating against the actual behavior customers depend on.

How does AI pair programming actually work for test automation?

The AI autonomously explores the codebase, understands data models and UI patterns, and generates test scripts, while the human developer provides domain context, reviews output, and makes architectural decisions. The AI handles high-volume analytical work—tracing data flows, generating repetitive test commands, resolving timing issues—while the developer ensures tests cover the right business scenarios.

Workflow-Based Testing

Tests were structured to follow the complete user journey through the application, from initial setup through final analysis validation.

Enterprise Setup

Configure the enterprise environment: veterinarians, contacts, and farm associations that mirror real customer configurations.

Data Entry

Enter lab reports with biological sequences requiring translation—each sequence takes 5–10 minutes to process, matching the real-world workflow timing.

Prescription Management

Create vaccine and reference preparation associations, linking sequences to the biological products used for comparison.

Search and Validation

Search for entered data and verify that 18+ fields match expectations across the UI, database, and search index.

Analysis Execution

Run homology analysis and validate similarity calculations, matrix layout, and color-coded results against captured baselines.

Baseline Capture Pattern

For computation-heavy outputs like homology matrices, the team implemented a baseline capture approach. The first test run captures the complete output—headers, similarity scores, color coding—as a JSON baseline. Subsequent runs validate that output matches this baseline exactly. This made it possible to test complex calculations without hardcoding expected results, and ensures that any change to the computation pipeline is immediately detected.

How do you test complex calculations when you can't easily hardcode expected results?

A baseline capture pattern records the complete output of a known-good system on the first run—values, formatting, structure—as a stored reference. Subsequent runs compare against this baseline, detecting any deviation. This is especially useful for domain-specific calculations like bioinformatics similarity matrices where manually computing expected results would be impractical.

How Claude Code Accelerated This Work
  • Autonomous codebase exploration: Claude navigated the unfamiliar AngularJS/C# codebase to understand data models, UI patterns, and component relationships without extensive hand-holding from the developer.
  • Script generation: Created Python database extraction and anonymization scripts directly from requirements, handling the translation between production schema and test data format.
  • Legacy framework debugging: Resolved timing and synchronization issues specific to AngularJS's digest cycle and asynchronous rendering—problems that typically require deep framework experience to diagnose.
  • Progressive test construction: Built incrementally from simple login tests to multi-hour end-to-end workflows, creating reusable command libraries at each stage that compounded into comprehensive coverage.
  • Documentation as byproduct: Generated 1,300+ lines of documentation as a natural artifact of the implementation process, capturing workflow knowledge that would otherwise exist only in the developer's notes.

What Was Delivered

Deliverable Description Scope
Test specifications End-to-end scenarios covering enterprise setup through sequence entry through analysis validation 80–90% of critical workflows
Custom test commands Reusable Cypress commands for navigation, data entry, and validation 55 commands
Test code Cypress test implementation with production data-driven assertions ~10,000 lines
Data extraction scripts Python scripts for extracting and anonymizing production data as test fixtures Multiple scripts
Application instrumentation Data-test attributes added throughout the legacy AngularJS application Full application coverage
Database automation Setup and cleanup scripts for test isolation and repeatability Complete lifecycle
Test orchestration Scripts for sequencing test execution and managing dependencies Full suite
Documentation Architecture guide, test catalog, extension guide, and troubleshooting reference 1,300+ lines

Key Takeaways

AI pair programming compresses the ramp-up timeline for unfamiliar codebases. Building test automation for a legacy application typically requires extensive time to understand the codebase before writing the first useful test. Claude Code's ability to autonomously explore code, trace data flows, and generate scripts against unfamiliar patterns compressed the timeline from months to weeks.

Anonymized production data is the most efficient path to a realistic test dataset. Extracting and anonymizing real customer data delivered comprehensive test coverage without the significant upfront investment that high-quality synthetic data generation requires. Purpose-built synthetic generation can ultimately produce denser edge-case coverage, but for this engagement, production data provided the fastest route to a large, credible test oracle.

An incremental validation ladder builds compounding returns. Starting with simple login tests and progressively adding complexity created reusable components at each stage. Each layer of test commands became the foundation for the next level of sophistication, turning early investment into accelerating returns.

Focused domain knowledge transfer pays outsized dividends. Six hours of customer meetings captured the essential workflow knowledge needed to scope and validate the entire engagement. The combination of minimal expert time and comprehensive AI-generated documentation enabled seamless handoff to the development team.

Tests are the prerequisite that unlocks business agility. The test suite transformed modernization from a high-risk project that nobody wanted to approve into a manageable, incremental effort with measurable safety. The business value wasn't the tests themselves—it was the modernization they made possible.

Frequently Asked Questions

Can AI help build test automation for legacy applications?

Yes. Agentic AI tools like Claude Code are particularly effective at building test automation for legacy applications because much of the work involves exploring unfamiliar codebases, understanding undocumented data models, and generating repetitive but precise test scripts. In this engagement, Claude Code autonomously navigated an AngularJS/C# codebase to understand UI patterns and data relationships, then generated test commands, database extraction scripts, and validation logic—work that would typically require extensive manual ramp-up time.

How long does it take to build test automation for a legacy application?

Timeline depends on application complexity and scope. Using AI pair programming with Claude Code, this engagement achieved 80–90% coverage of mission-critical workflows in 160 hours (two sprints) with one developer. The same scope without AI assistance would typically require months of effort due to the ramp-up time needed to understand an unfamiliar legacy codebase, plus the manual effort of writing thousands of lines of test code and database scripts.

Should test automation use production data or synthetic data?

Both approaches have merit. Anonymized production data is the most time-efficient path to a large, realistic test dataset—it captures actual customer workflows without requiring upfront investment in data generation tooling. Purpose-built synthetic data generation can ultimately produce denser edge-case coverage, but developing high-quality generation is a significant effort in itself. The right choice depends on timeline, budget, and how domain-specific the application's data patterns are.

What is a baseline capture pattern and when should you use it?

A baseline capture pattern records the complete output of a computation on the first test run—headers, values, formatting—as a stored reference (typically JSON). Subsequent runs compare their output against this baseline, detecting any change. This is particularly useful for testing complex calculations where hardcoding expected results would be impractical, such as bioinformatics similarity matrices or financial risk models. The baseline is established against the known-good production system, then used to validate that changes produce identical results.

How do you handle test automation for applications with long processing times?

Applications with long processing times—like bioinformatics platforms where sequence translation takes 5–10 minutes per sequence—require test architectures that accommodate real-world timing. This engagement used workflow-based tests that mirror actual user journeys including wait times, with orchestration scripts managing dependencies between stages. The automated suite runs overnight, turning processing time from a blocking constraint into a background operation.

How does AI pair programming differ from code generation tools?

AI pair programming with agentic tools like Claude Code goes well beyond code generation. The AI autonomously explores unfamiliar codebases, traces data flows, diagnoses framework-specific issues (like AngularJS digest cycle timing problems), and iterates on solutions—behaving more like a capable colleague than a code completion tool. The human developer provides domain context, reviews output, and makes architectural decisions while the AI handles the high-volume analytical and generative work.

What does the development team need to maintain and extend the test suite after the engagement?

The engagement was designed for handoff from the start. Deliverables included 1,300+ lines of documentation covering architecture, test catalogs, extension guides, and troubleshooting references. The 55 reusable test commands and established patterns mean new test scenarios follow existing conventions rather than requiring design decisions. The development team was trained during the engagement and was fully onboarded at sprint conclusion.

Is test automation necessary before modernizing a legacy application?

For applications with complex business logic and active customers, automated tests are a practical prerequisite for modernization. Without them, every change carries the risk of silently breaking workflows customers depend on. Manual regression testing is impractical at the frequency needed during active modernization—subject matter experts would need days per release cycle to validate comparable scope. Automated tests run overnight, enabling daily validation and early defect detection.

Legacy Application Blocking Your Modernization?

Capstone IT builds the test automation and safety nets that make legacy modernization possible. We combine AI-accelerated development with deep technical expertise to deliver results in weeks, not months.

Schedule a Consultation