Three Pillars of Data for FWA Research: How AI Agents Transform Healthcare Fraud Investigation

by Yubin Park, Co-Founder / CTO

Last week, we introduced our Provider Investigator tool to our clients. It's an AI agent that analyzes vast amounts of data, organizing and identifying signals for fraud, waste, and abuse (FWA). It has already saved tremendous time for both our team and our clients. Today, I'd like to illustrate how this AI agent works.

The Investigation Process

Suppose you have a healthcare provider of interest - a clinical laboratory, medical group, or solo practitioner. This provider can also be automatically identified through our algorithms (a topic for a future blog post). Once identified, our agent gets to work.

We analyze three distinct pillars of data:

Internal Datasets + Digital Footprint + Public Datasets

Top tip

The power isn't in any single data source—it's in the connections between them. A provider might look clean in claims data but have concerning patient reviews and past regulatory actions.

1. Internal Datasets

Our internal datasets include client claims data and licensed/proprietary datasets that are not publicly available. Through this pillar, we examine:

  • Utilization patterns - Identifying unusual billing frequencies or procedure combinations
  • Provider affiliations - Mapping relationships between entities and ownership structures
  • Unbundling practices - Detecting when procedures are inappropriately separated to increase reimbursement
  • High-cost procedure analysis - Flagging disproportionate billing of expensive procedures relative to patient population
  • Peer comparisons - Benchmarking against similar providers in geography and specialty
  • Payment velocity - Analyzing billing timing and reimbursement patterns
  • Patient attribution - Understanding referral patterns and patient journey anomalies

2. Digital Footprint

Our agentic web crawling capabilities scour publicly available online information:

  • Customer reviews and ratings - Patient feedback from Google Reviews, Yelp, Healthgrades, and specialized medical review sites revealing service quality issues
  • Social discussions - Reddit threads, patient forums, and community discussions about provider experiences
  • Legal case history - Court records, malpractice suits, and regulatory actions from legal databases
  • Ownership trails - Corporate filings, business registrations, and ownership changes over time
  • Professional networks - LinkedIn connections, professional associations, and declared affiliations
  • News mentions - Media coverage, press releases, and industry publications
  • Social media presence - Professional and personal social media activity across platforms
  • Website analysis - Services advertised, credentials claimed, and marketing practices

3. Public Datasets

Public sector data represents a treasure trove of information that many organizations haven't fully leveraged:

CMS (Centers for Medicare & Medicaid Services):

  • Provider enrollment data and National Provider Identifier (NPI) records
  • Medicare Part B and Part D prescriber data
  • Quality payment program results and MIPS scores
  • Open payments data showing industry relationships

HHS-OIG (Office of Inspector General):

  • List of Excluded Individuals and Entities (LEIE)
  • Provider self-disclosure protocols and settlement agreements
  • Compliance program guidance and advisory opinions

State Government Resources:

  • Professional licensing boards and disciplinary actions
  • State Medicaid program data and provider directories
  • Workers' compensation and disability insurance databases
  • Business entity registrations and tax records

Other Federal Agencies:

  • FDA drug shortage databases and safety communications
  • DEA controlled substance registration data
  • Federal procurement and contract award databases

The Manual Investigation Challenge

Traditionally, this data investigation has been extremely manual and time-intensive:

Internal Dataset Complexity: Our datasets often contain hundreds or thousands of disparate tables with varying data quality. Analysts spend hours querying multiple tables, cross-referencing values, and validating data integrity before drawing any conclusions.

Internet Research Inefficiency: Investigators manually dig through Google search results, constantly modify search keywords, read through potential evidence, and piece together fragmented information scattered across multiple websites and sources.

Public Dataset Barriers: Government datasets are massive but poorly organized. While incredibly rich in information, they're often in different formats, have inconsistent data schemas, and require specialized knowledge to navigate effectively. Most organizations haven't tapped into this resource because the task seems so daunting.

Enter AI Agent Technology

What exactly are agents? Agents are AI systems that can plan, execute tasks independently, and learn from their environment. Unlike simple chatbots that respond to queries, agents begin with a command or discussion, then plan and operate autonomously. They can use tools, recover from errors, and pause for human feedback when needed—all while gaining "ground truth" from their environment at each step to assess progress.

Top tip

Traditional investigations stop when initial evidence is thin. AI agents excel at the follow-up questions—if claims data shows unusual patterns, they automatically dig deeper into digital footprints and public records to find supporting evidence.

We've been systematically addressing these challenges, leveraging recent advances in AI agent technologies. Here's how our solution works:

Our Falcon AI serves as the orchestrating intelligence, coordinating specialized sub-agents for each data domain. These agents:

  1. Collect Evidence - Each sub-agent focuses on its specialized domain, gathering relevant information autonomously
  2. Review and Analyze - The orchestrator reviews collected evidence for patterns and gaps
  3. Ask Follow-up Questions - Based on initial findings, the system generates new investigative queries
  4. Iterate and Refine - The process repeats, diving deeper into promising leads while maintaining context

This represents true deep research into potential FWA cases, combining the thoroughness of human investigation with the speed and scalability of AI.

The Result

After processing all three data pillars, you receive a comprehensive investigation report that would have taken days or weeks to compile manually—now generated in hours.

Top tip

Manual investigations typically uncover 30-40% of available evidence due to time constraints. Our AI agents consistently find 80-90% of relevant signals across all three data pillars—often discovering connections human investigators would miss.

The Impact

The time savings are already substantial. What once required multiple analysts working for days can now be accomplished in hours, with more comprehensive coverage and fewer missed connections. Most importantly, this frees up our human experts to focus on the complex analytical work that requires human judgment and domain expertise.

This is just the beginning. As we continue refining our AI agents and expanding our data sources, we're building toward a future where FWA detection is both more accurate and more efficient than ever before.


Interested in learning more about how we help organizations move beyond manual, time-intensive fraud investigations? Connect with us to see our three-pillar AI agent approach in action.

More articles

Bridging General Intelligence to Actionability: Making AI Actually Useful for Healthcare Fraud Investigation

Despite impressive general intelligence, AI models struggle with specialized healthcare fraud detection. The gap isn't capability—it's context. Here's how Falcon AI bridges general intelligence to actionability by giving AI the right domain-specific tools, achieving 8.3/10 performance vs 6.5-7.2 for baseline models.

Read more

Why Healthcare's Most Critical Database is Nearly Impossible to Search

The Medicare Coverage Database governs $32 billion in improper payments yet finding relevant policies requires navigating 10,000+ documents with inconsistent medical terminology. Current search tools fail, forcing analysts to manually hunt through irrelevant results. Here's how Falcon Healthcare built AI that actually understands healthcare data.

Read more

Partner with Falcon Health and join us in protecting the integrity of healthcare through intelligent monitoring.

Our offices

  • Palo Alto, CA
    Corporate HQ
  • Atlanta, GA
    Technology Branch