Three Pillars of Data for FWA Research: How AI Agents Transform Healthcare Fraud Investigation

August 20, 2025

by Yubin Park, Co-Founder / CTO

Last week, we introduced our Provider Investigator tool to our clients. It's an AI agent that analyzes vast amounts of data, organizing and identifying signals for fraud, waste, and abuse (FWA). It has already saved tremendous time for both our team and our clients. Today, I'd like to illustrate how this AI agent works.

The Investigation Process

Suppose you have a healthcare provider of interest - a clinical laboratory, medical group, or solo practitioner. This provider can also be automatically identified through our algorithms (a topic for a future blog post). Once identified, our agent gets to work.

We analyze three distinct pillars of data:

Internal Datasets + Digital Footprint + Public Datasets

Top tip

The power isn't in any single data source—it's in the connections between them. A provider might look clean in claims data but have concerning patient reviews and past regulatory actions.

1. Internal Datasets

Our internal datasets include client claims data and licensed/proprietary datasets that are not publicly available. Through this pillar, we examine:

Utilization patterns - Identifying unusual billing frequencies or procedure combinations
Provider affiliations - Mapping relationships between entities and ownership structures
Unbundling practices - Detecting when procedures are inappropriately separated to increase reimbursement
High-cost procedure analysis - Flagging disproportionate billing of expensive procedures relative to patient population
Peer comparisons - Benchmarking against similar providers in geography and specialty
Payment velocity - Analyzing billing timing and reimbursement patterns
Patient attribution - Understanding referral patterns and patient journey anomalies

2. Digital Footprint

Our agentic web crawling capabilities scour publicly available online information:

Customer reviews and ratings - Patient feedback from Google Reviews, Yelp, Healthgrades, and specialized medical review sites revealing service quality issues
Social discussions - Reddit threads, patient forums, and community discussions about provider experiences
Legal case history - Court records, malpractice suits, and regulatory actions from legal databases
Ownership trails - Corporate filings, business registrations, and ownership changes over time
Professional networks - LinkedIn connections, professional associations, and declared affiliations
News mentions - Media coverage, press releases, and industry publications
Social media presence - Professional and personal social media activity across platforms
Website analysis - Services advertised, credentials claimed, and marketing practices

3. Public Datasets

Public sector data represents a treasure trove of information that many organizations haven't fully leveraged:

CMS (Centers for Medicare & Medicaid Services):

Provider enrollment data and National Provider Identifier (NPI) records
Medicare Part B and Part D prescriber data
Quality payment program results and MIPS scores
Open payments data showing industry relationships

HHS-OIG (Office of Inspector General):

List of Excluded Individuals and Entities (LEIE)
Provider self-disclosure protocols and settlement agreements
Compliance program guidance and advisory opinions

State Government Resources:

Professional licensing boards and disciplinary actions
State Medicaid program data and provider directories
Workers' compensation and disability insurance databases
Business entity registrations and tax records

Other Federal Agencies:

FDA drug shortage databases and safety communications
DEA controlled substance registration data
Federal procurement and contract award databases

The Manual Investigation Challenge

Traditionally, this data investigation has been extremely manual and time-intensive:

Internal Dataset Complexity: Our datasets often contain hundreds or thousands of disparate tables with varying data quality. Analysts spend hours querying multiple tables, cross-referencing values, and validating data integrity before drawing any conclusions.

Internet Research Inefficiency: Investigators manually dig through Google search results, constantly modify search keywords, read through potential evidence, and piece together fragmented information scattered across multiple websites and sources.

Public Dataset Barriers: Government datasets are massive but poorly organized. While incredibly rich in information, they're often in different formats, have inconsistent data schemas, and require specialized knowledge to navigate effectively. Most organizations haven't tapped into this resource because the task seems so daunting.

Enter AI Agent Technology

What exactly are agents? Agents are AI systems that can plan, execute tasks independently, and learn from their environment. Unlike simple chatbots that respond to queries, agents begin with a command or discussion, then plan and operate autonomously. They can use tools, recover from errors, and pause for human feedback when needed—all while gaining "ground truth" from their environment at each step to assess progress.

Top tip

Traditional investigations stop when initial evidence is thin. AI agents excel at the follow-up questions—if claims data shows unusual patterns, they automatically dig deeper into digital footprints and public records to find supporting evidence.

We've been systematically addressing these challenges, leveraging recent advances in AI agent technologies. Here's how our solution works:

Our Falcon AI serves as the orchestrating intelligence, coordinating specialized sub-agents for each data domain. These agents:

Collect Evidence - Each sub-agent focuses on its specialized domain, gathering relevant information autonomously
Review and Analyze - The orchestrator reviews collected evidence for patterns and gaps
Ask Follow-up Questions - Based on initial findings, the system generates new investigative queries
Iterate and Refine - The process repeats, diving deeper into promising leads while maintaining context

This represents true deep research into potential FWA cases, combining the thoroughness of human investigation with the speed and scalability of AI.

The Result

After processing all three data pillars, you receive a comprehensive investigation report that would have taken days or weeks to compile manually—now generated in hours.

Top tip

Manual investigations typically uncover 30-40% of available evidence due to time constraints. Our AI agents consistently find 80-90% of relevant signals across all three data pillars—often discovering connections human investigators would miss.

The Impact

The time savings are already substantial. What once required multiple analysts working for days can now be accomplished in hours, with more comprehensive coverage and fewer missed connections. Most importantly, this frees up our human experts to focus on the complex analytical work that requires human judgment and domain expertise.

This is just the beginning. As we continue refining our AI agents and expanding our data sources, we're building toward a future where FWA detection is both more accurate and more efficient than ever before.

Interested in learning more about how we help organizations move beyond manual, time-intensive fraud investigations? Connect with us to see our three-pillar AI agent approach in action.

Our offices

Follow us

Three Pillars of Data for FWA Research: How AI Agents Transform Healthcare Fraud Investigation

The Investigation Process

1. Internal Datasets

2. Digital Footprint

3. Public Datasets

The Manual Investigation Challenge

Enter AI Agent Technology

The Result

The Impact

More articles

Your Falcon Health Journey: A Step-by-Step Guide

Moving at the Speed of Thought: AI That Gives Teams Autonomy

Partner with Falcon Health and join us in protecting the integrity of healthcare through intelligent monitoring.

Our offices