Is Your CEO a Deepfake? 5 Ways to Secure Your Business Against AI Scams
The $25 million video call
The finance officer did everything right.
He joined the Zoom call. He recognised the CFO’s face. He heard the familiar voice, the same cadence, the same informal tone from dozens of previous meetings. The CFO explained that the acquisition was moving faster than expected. The wire needed to go out before the end of business. Confidentiality was important. Move quickly.
He transferred $25 million to an account in Hong Kong.
But the bitter truth was that the CFO had never made that call. The face on screen was a deepfake. The voice was a clone stitched together from earnings calls and public interviews. Every pixel of that video was synthetic, and none of it glitched, flickered, or gave itself away.
That case is from 2024. In 2026, the tools that produced it are faster, cheaper, and more convincing than they were then. A report from the security firm Recorded Future found that real-time deepfake video generation has advanced to the point where frame-by-frame forensic artefacts — the artefacts detection tools relied on — are no longer reliably present in live injection attacks.
The threat has matured. Your procedures have not.
There is a phrase that gets repeated in corporate security briefings: “trust your gut.” If something feels off, pause. If the request seems unusual, verify. That advice was never great, and in 2026, it is genuinely dangerous. Your gut reads faces and voices. Attackers know this. The attack is designed to satisfy exactly the signals your gut is trained to trust.
This is not a technology problem with a technology solution. It is a trust problem, and it requires a different kind of response.
How the scam actually works
The mechanics matter because most security teams are still preparing for the wrong version of this attack.
Voice cloning is no longer a resource-intensive operation.
Current tools can produce a convincing voice clone from as little as 15 to 30 seconds of source audio. That audio does not need to come from anywhere special. A CFO’s earnings call excerpt on YouTube, a LinkedIn video, a podcast appearance, a company town hall recording posted internally — all of these are sufficient. The resulting clone reproduces not just pitch and tone but speech rhythm, filler words, and the specific way a person trails off at the end of a sentence. In a phone call or the audio track of a video call, it is indistinguishable from the real speaker to an untrained human ear.
Real-time video injection is now consumer software.
Tools like DeepFaceLive and its commercial equivalents allow an attacker to map a target’s face over their own face during a live video call. This is not a post-production effect. It runs at frame rate, in real time, during an active Zoom or Teams session. The attacker points their camera at their own face. The software replaces it with the target’s face. To the person on the receiving end, they are looking at their CFO.
The quality is not perfect under every condition, but it does not need to be. It needs to be good enough to pass a 20-minute video call with someone who has no particular reason to be suspicious, conducted under moderate pressure.
The deepfake is not the scam. Urgency is the scam.
These attacks combine synthetic identity with social engineering principles that are decades old. The deepfake removes the friction from impersonation, and it answers the question “but how do I know it’s really you” before the target thinks to ask it. The actual manipulation is the manufactured urgency: the merger closes tonight, the regulator is waiting, this cannot go through normal channels, do not loop in the legal team. Deepfakes make authority claims credible. Urgency prevents verification. Together, they are remarkably effective against people who are otherwise security-aware.
At Bridge Infrastructure, a West African fintech processing payment data across four regulatory jurisdictions, this threat is concrete. A payment operations lead receives a video message that appears to be from the Group CFO. The CFO’s face, voice, and mannerisms are accurate. The request is to approve an emergency float transfer of 50 million naira before the banking window closes. The operations lead has approved similar requests before. The CFO has called in for urgent approvals before. The existing controls are designed for external fraud, not internal impersonation.
The window from “that looks like the CFO” to “authorise the transfer” is the entire attack surface.

5 ways to secure your business against deepfake fraud
1. The out-of-band verification rule
Any financial request, access escalation, or sensitive instruction delivered over video or voice requires confirmation through a second, pre-registered channel before action is taken.
This is not complicated. It is a policy decision that most organizations have not made.
The rule works like this: if a request arrives over Zoom, confirm it via a direct phone call to a number already saved in your contacts. Not a number provided during the call. Not a number texted to you during the conversation. A number you already had, from a directory established before the call happened. If a request arrives via WhatsApp video, confirm via an internal ticketing system or a signal to a known device.
The key phrase is “pre-registered channel.” Attackers who have compromised video can sometimes also compromise messaging channels opened during the conversation. The verification channel must predate the attack.
For Bridge Infrastructure, this means the payment operations team maintains a verified internal call list. Any transaction above a defined threshold, regardless of who appears to be requesting it, requires a callback confirmation before the authorization workflow begins. The policy is documented, trained, and enforced without exception for senior leadership requests. Especially for senior leadership requests.
The key to making these protocols stick is clarity. If your security policies are buried in dense legalese, they won’t be followed under pressure (for more on this, see my guide on writing GRC documentation for non-technical stakeholders)
2. The challenge-response protocol
Before high-value transactions or sensitive access decisions, require the requester to provide a shared secret that was established in a prior, known-good interaction.
These are sometimes called safe words, abort codes, or verification phrases. The concept is borrowed from intelligence tradecraft. A shared secret is a piece of information that both parties know, but that could not be obtained from public sources or from observing the target’s public-facing behaviour. It cannot be cloned from an earnings call. It cannot be synthesized from a LinkedIn profile.
Rotating these phrases on a regular schedule reduces the risk of compromise over time. The phrases should be stored offline or in a hardware-encrypted credential vault, not in a shared document accessible from a browser.
The challenge-response protocol defeats real-time impersonation because the attacker does not know what the phrase is. Even a perfect visual and audio deepfake of the CFO cannot produce a shared secret that was never given.
This control sounds low-tech. It is. That is exactly what makes it reliable.
3. Liveness detection and AI-based forensics
Technical controls exist that can identify synthetic video and audio with reasonable accuracy. They should be part of your detection layer, though not your only layer.
On the video side, photoplethysmography-based detection tools analyze pixel-level colour variation in facial skin to detect blood flow patterns. A real human face has a measurable, rhythmic variation in skin tone driven by the cardiac cycle. Deepfake video generated from a static source image or a cloned face often lacks this variation or produces it inconsistently. Tools from vendors including Microsoft, Sensity AI, and Reality Defender apply variations of this analysis in real time.
On the audio side, voice anti-spoofing models are trained to detect the specific artefacts that voice synthesis introduces. These artefacts are subtle and not audible to a human, but they appear consistently in the signal’s spectral features.
None of these tools is infallible. Adversarial deepfake generation increasingly targets known detection weaknesses. The posture here should be: technical detection as one layer of a multi-layer defence, not as the primary control. A liveness detection flag should trigger the out-of-band verification protocol, not automatically block a transaction.
Organizations operating at scale should evaluate integrating liveness detection into their video conferencing stack and their call recording analysis pipeline. The investment is proportionate to the transaction volumes and authorization authority levels at risk.
4. Zero Trust identity for internal video
The Zero Trust framework, codified in NIST SP 800-207, is built on a single operating assumption: no actor, device, or session should be trusted by default, regardless of where the request originates.
Most organizations apply Zero Trust to network access. Very few apply it to identity claims made during video or voice sessions.
The shift required here is conceptual. Stop treating a person’s face and voice as identity verification. Start treating the device they are calling from, and the hardware token or certificate bound to that device, as the trust anchor.
In practice, this means requiring device-bound authentication before any session in which sensitive decisions will be made. If the CFO’s laptop or registered mobile device is not the source of the session request, the session does not carry the CFO’s authorization level. The CFO’s face appearing in the video frame is, by itself, insufficient.
This is a more significant architectural change than the previous controls, but it is the right long-term direction. For organizations implementing Zero Trust progressively, high-value authorization sessions are a logical starting point for applying device-bound identity requirements.
5. Deepfake simulation drills
Phishing simulation has been a standard component of security awareness training for years. The logic is straightforward: people learn to recognize attacks better when they have experienced a safe version of one.
The same logic applies to deepfake attacks. An employee who has never encountered a synthetic face in a realistic business context is not prepared to identify one under pressure. An employee who has been part of a controlled deepfake drill, who has experienced the moment of “that seemed real,” is better calibrated for the actual event.
Authorized deepfake simulation programs are beginning to emerge as a category in security awareness training. The structure mirrors phishing simulations: a controlled attack is run against a defined group of employees, the responses are tracked, and the debrief is educational rather than punitive.
For Bridge Infrastructure, this would mean running a simulated CFO authorization request using a deepfake model trained on publicly available footage, targeting the payment operations team, and using the results to identify gaps in protocol adherence rather than to evaluate individual judgment.
The goal is not to catch people making mistakes. The goal is to build muscle memory for the verification steps so that those steps happen automatically when the real attack occurs.
Where this is going: cryptographic identity
The five controls above are practical and deployable today. They address the symptoms of a deeper structural problem: in the current communications infrastructure, there is no reliable way to prove that a face on a screen belongs to the person it appears to be.
The industry is working on this, but the solutions are not yet at deployment scale.
The Content Authenticity Initiative (CAI) and the C2PA standard represent one direction. C2PA is a technical specification that embeds cryptographic provenance data into media files at the point of capture. A video recorded on a C2PA-compliant camera carries a signed attestation of when it was recorded, on what device, and whether it has been modified. This does not solve real-time impersonation, but it creates a verifiable chain of custody for recorded media used in business contexts.
Decentralized identity (DID) represents a longer-term architectural answer. The W3C DID specification describes a method for creating self-sovereign digital identifiers that are cryptographically verifiable without reliance on a centralized authority. In a DID-enabled transaction authorization workflow, the CFO would not prove identity by appearing on video. They would prove their identity by signing the authorization request with a private key bound to their verified credential. A synthetic face cannot produce that signature.
The honest framing for West African enterprise contexts, including organizations like Bridge Infrastructure, is that neither C2PA nor DID is deployed at scale in this environment today. The regulatory infrastructure, device ecosystem, and vendor support required for widespread adoption is still developing. Organizations in these markets are not waiting for cryptographic identity to become standard. They are operating in the present, with present-day controls.
That gap between where the technology is heading and where most organizations currently stand is not a reason for paralysis. It is a reason to build the procedural and cultural controls now, so that the transition to cryptographic verification can be made without abandoning everything that came before.
Beyond “seeing is believing”
The security conversation around deepfakes tends to focus on detection: how do you identify a fake face, how do you catch a synthetic voice, what tool flags the anomaly.
Detection matters. But detection is a reactive posture, and it puts the entire weight of defence on a single moment of recognition under pressure.
The more durable defence is cultural. It is an organization that has decided, at the policy level, that no visual or audio identity claim is sufficient on its own to authorize a sensitive action. It is a finance team that has practiced the pause, the callback, and the challenge phrase enough times that these steps are reflexive rather than deliberate. It is a security program that treats the gap between “that looks real” and “that is real” as the exact space where risk lives.
The most dangerous thing a business can have is a confident workforce that trusts what it sees. The most secure thing it can have is a practised habit of verification, applied without exception, especially when everything looks familiar.
The CFO’s face on that Zoom screen looked right. The voice sounded right. The context made sense. And $25 million left the account before anyone thought to call back.
That is the lesson. Not about technology. About process.
Build the process now, before the call comes.