Busted! Engineers Revolutionize Fraud Detection with Machine Learning
In the U.S., credit card fraud costs $5 billion annually, identity theft adds $16.4 billion, and Medicare fraud drains $60 billion each year.
Fraud is widespread in the United States and increasingly driven by technology. For example, 93% of credit card fraud now involves remote account access, not physical theft. In 2023, fraud losses surpassed $10 billion for the first time. The financial toll is staggering: credit card fraud costs $5 billion annually, affecting 60% of U.S. cardholders, while identity theft resulted in $16.4 billion in losses in 2021. Medicare fraud costs $60 billion each year, and government losses range from $233 billion to $521 billion annually, with improper payments totaling $2.7 trillion since 2003.
Machine learning plays a critical role in fraud detection by identifying patterns and anomalies in real-time. It analyzes large datasets to spot normal behavior and flag significant deviations, such as unusual transactions or account access. However, fraud detection is challenging because fraud cases are much rarer than normal ones, and the data is often messy or unlabeled.
To address these challenges, researchers from the College of Engineering and Computer Science at Florida Atlantic University have developed a novel method for generating binary class labels in highly imbalanced datasets, offering a promising solution for fraud detection in industries like health care and finance. This approach works without relying on labeled data, a key advantage in sectors where privacy concerns and the cost of labeling are significant obstacles.
The team tested their method on two real-world, large-scale datasets with severe class imbalance (less than 0.2%): European credit card transactions (more than 280,000 from September 2013) and Medicare Part D claims (more than 5 million from 2013 to 2019), both labeled as fraudulent or genuine. These datasets, with fraud cases far outnumbered by non-fraud cases, provide a real-world challenge ideal for testing fraud detection methods.
Results of the study, published in the Journal of Big Data, show that this new labeling method effectively addresses the challenge of labeling severely imbalanced data in an unsupervised framework. Additionally, and unlike traditional methods, this approach evaluated the newly generated fraud and non-fraud labels directly without the need of relying on a supervised classifier.
“The use of machine learning in fraud detection brings many advantages,” said Taghi Khoshgoftaar, Ph.D., senior author and Motorola Professor in the FAU Department of Electrical Engineering and Computer Science. “Machine learning algorithms can label data much faster than human annotation, significantly improving efficiency. Our method represents a major advancement in fraud detection, especially in highly imbalanced datasets. It reduces the workload by minimizing cases that require further inspection, which is crucial in sectors like Medicare and credit card fraud, where fast data processing is vital to prevent financial losses and enhance operational efficiency.”
The study shows the new method outperformed the widely-used Isolation Forest algorithm, providing a more efficient way to identify fraud while minimizing the need for further investigation. This confirms the method’s ability to generate reliable binary class labels for fraud detection, even in challenging datasets. It offers a scalable solution for detecting fraud without relying on costly and time-consuming labeled data, which requires significant manual expert input and is resource-intensive, especially for large datasets.
“Our method generates labels for both fraud or positive and non-fraud or negative instances, which are then refined to minimize the number of fraud labels,” said Mary Anne Walauskis, first author and a Ph.D. candidate in the FAU Department of Electrical Engineering and Computer Science. “By applying our method, we minimize false positives, or in other words, genuine instances marked as fraud, which is key to improving fraud detection.
This approach ensures that only the most confidently identified fraud cases are retained, enhancing accuracy and reducing unnecessary alarms, making fraud detection more efficient.”
The method combines two strategies: an ensemble of three unsupervised learning techniques using the SciKit-learn library and a percentile-gradient approach. The goal is to minimize false positives by focusing on the most confidently identified fraud cases. This is achieved by refining the labels and reducing errors in both the unsupervised methods (EUM) and the percentile-gradient approach (PGM).
The refined labels create a subset of confident labels that are highly likely to be accurate. These labels are then used to create confidence intervals and finalize the labeling, requiring minimal domain knowledge to select the number of positive instances.
“This innovative approach holds great promise for industries plagued by fraud, offering a more accessible and effective way to identify fraudulent activity and safeguard both financial and health care systems,” said Stella Batalama, Ph.D., dean of the College of Engineering and Computer Science. “Fraud’s impact goes beyond financial losses, including emotional distress, reputational damage and reduced trust in organizations. Health care fraud, in particular, undermines care quality and cost, while identity theft can cause severe stress. Addressing fraud is key to mitigating its broad societal impact.”
Looking ahead, the research team plans to enhance the method by automating the determination of the optimal number of positive instances, further improving efficiency and scalability for large-scale applications.
The current journal article, Unsupervised Label Generation for Severely Imbalanced Fraud Data, is an updated version of the researchers’ previous work, Confident Labels: A Novel Approach to New Class Labeling and Evaluation on Highly Imbalanced Data. The original paper was presented and published at the IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI) in November 2024, where it won the Best Student Paper Award. ICTAI, with an acceptance rate of about 25% from more than 400 submissions, is a prestigious conference.
-FAU-
Latest Research
- FAU Named a National Center of Academic Excellence in Cyber ResearchFAU has been recognized as a National Center of Academic Excellence in Cyber Research by the National Security Agency and its partners in the National Centers of Academic Excellence in Cybersecurity.
- FAU Seeks Participants for Parkinson's, Aging and Mind-Body StudyFAU is recruiting participants for an exciting research study aimed at understanding the effects of Parkinson's disease and normal aging on cognitive performance, exercise ability and the mind-body connection.
- Photobomb: Shark Cam Captures Ocean Encounter With a Great WhiteIn an unprecedented underwater encounter, a nurse shark equipped with a camera tag has captured footage of a great white shark off the coast of Boynton Beach, delighting FAU marine biologists.
- Engineers Bring Sign Language to 'Life' Using AIFAU engineering researchers have developed an innovative interpretation system using AI, which translates American Sign Language gestures into text in real time with 98.2% accuracy.
- Report: Supply Chain Index Declines as Tariffs Hit EconomyThe economy could be headed for a downturn as the supply chain starts to contract amid tariff policies and rising uncertainty, according to researchers at Florida Atlantic University and four other schools.
- 'Flex Appeal:' Balancing Armor and Efficiency in Sea Turtle ShellsUsing biomechanics, FAU researchers have uncovered the "trade-off" between armor and efficiency in sea turtle shells, and how this balance has allowed them to survive in the ocean for millions of years.