Three Pillars of Data for FWA Research: How AI Agents Transform Healthcare Fraud Investigation
by Yubin Park, Co-Founder / CTO
Last week, we introduced our Provider Investigator tool to our clients. It's an AI agent that analyzes vast amounts of data, organizing and identifying signals for fraud, waste, and abuse (FWA). It has already saved tremendous time for both our team and our clients. Today, I'd like to illustrate how this AI agent works.
The Investigation Process

Suppose you have a healthcare provider of interest - a clinical laboratory, medical group, or solo practitioner. This provider can also be automatically identified through our algorithms (a topic for a future blog post). Once identified, our agent gets to work.
We analyze three distinct pillars of data:
Internal Datasets + Digital Footprint + Public Datasets
Top tip
1. Internal Datasets
Our internal datasets include client claims data and licensed/proprietary datasets that are not publicly available. Through this pillar, we examine:
- Utilization patterns - Identifying unusual billing frequencies or procedure combinations
- Provider affiliations - Mapping relationships between entities and ownership structures
- Unbundling practices - Detecting when procedures are inappropriately separated to increase reimbursement
- High-cost procedure analysis - Flagging disproportionate billing of expensive procedures relative to patient population
- Peer comparisons - Benchmarking against similar providers in geography and specialty
- Payment velocity - Analyzing billing timing and reimbursement patterns
- Patient attribution - Understanding referral patterns and patient journey anomalies
2. Digital Footprint
Our agentic web crawling capabilities scour publicly available online information:
- Customer reviews and ratings - Patient feedback from Google Reviews, Yelp, Healthgrades, and specialized medical review sites revealing service quality issues
- Social discussions - Reddit threads, patient forums, and community discussions about provider experiences
- Legal case history - Court records, malpractice suits, and regulatory actions from legal databases
- Ownership trails - Corporate filings, business registrations, and ownership changes over time
- Professional networks - LinkedIn connections, professional associations, and declared affiliations
- News mentions - Media coverage, press releases, and industry publications
- Social media presence - Professional and personal social media activity across platforms
- Website analysis - Services advertised, credentials claimed, and marketing practices
3. Public Datasets
Public sector data represents a treasure trove of information that many organizations haven't fully leveraged:
CMS (Centers for Medicare & Medicaid Services):
- Provider enrollment data and National Provider Identifier (NPI) records
- Medicare Part B and Part D prescriber data
- Quality payment program results and MIPS scores
- Open payments data showing industry relationships
HHS-OIG (Office of Inspector General):
- List of Excluded Individuals and Entities (LEIE)
- Provider self-disclosure protocols and settlement agreements
- Compliance program guidance and advisory opinions
State Government Resources:
- Professional licensing boards and disciplinary actions
- State Medicaid program data and provider directories
- Workers' compensation and disability insurance databases
- Business entity registrations and tax records
Other Federal Agencies:
- FDA drug shortage databases and safety communications
- DEA controlled substance registration data
- Federal procurement and contract award databases
The Manual Investigation Challenge
Traditionally, this data investigation has been extremely manual and time-intensive:
Internal Dataset Complexity: Our datasets often contain hundreds or thousands of disparate tables with varying data quality. Analysts spend hours querying multiple tables, cross-referencing values, and validating data integrity before drawing any conclusions.
Internet Research Inefficiency: Investigators manually dig through Google search results, constantly modify search keywords, read through potential evidence, and piece together fragmented information scattered across multiple websites and sources.
Public Dataset Barriers: Government datasets are massive but poorly organized. While incredibly rich in information, they're often in different formats, have inconsistent data schemas, and require specialized knowledge to navigate effectively. Most organizations haven't tapped into this resource because the task seems so daunting.
Enter AI Agent Technology
What exactly are agents? Agents are AI systems that can plan, execute tasks independently, and learn from their environment. Unlike simple chatbots that respond to queries, agents begin with a command or discussion, then plan and operate autonomously. They can use tools, recover from errors, and pause for human feedback when needed—all while gaining "ground truth" from their environment at each step to assess progress.
Top tip
We've been systematically addressing these challenges, leveraging recent advances in AI agent technologies. Here's how our solution works:
Our Falcon AI serves as the orchestrating intelligence, coordinating specialized sub-agents for each data domain. These agents:
- Collect Evidence - Each sub-agent focuses on its specialized domain, gathering relevant information autonomously
- Review and Analyze - The orchestrator reviews collected evidence for patterns and gaps
- Ask Follow-up Questions - Based on initial findings, the system generates new investigative queries
- Iterate and Refine - The process repeats, diving deeper into promising leads while maintaining context
This represents true deep research into potential FWA cases, combining the thoroughness of human investigation with the speed and scalability of AI.
The Result
After processing all three data pillars, you receive a comprehensive investigation report that would have taken days or weeks to compile manually—now generated in hours.

Top tip
The Impact
The time savings are already substantial. What once required multiple analysts working for days can now be accomplished in hours, with more comprehensive coverage and fewer missed connections. Most importantly, this frees up our human experts to focus on the complex analytical work that requires human judgment and domain expertise.
This is just the beginning. As we continue refining our AI agents and expanding our data sources, we're building toward a future where FWA detection is both more accurate and more efficient than ever before.
Interested in learning more about how we help organizations move beyond manual, time-intensive fraud investigations? Connect with us to see our three-pillar AI agent approach in action.