Adversary-driven Social Engineering

Ben Standen

Oct 21, 2025 — 15 min read

This article discusses the key principles and practical considerations for conducting adversary-driven social engineering engagements that prioritise realism over volume. Providing organisations with meaningful insights into their ability to detect, investigate, and contain sophisticated social engineering attacks.

Before diving into the methodology, techniques and technical details, I want to clarify the fundamental differences between penetration testing and adversary simulation, particularly as they relate to social engineering. While both provide practical and actionable security insights, they differ significantly in approach and objectives.

Penetration testing focuses on assessing the breadth of vulnerabilities within a product or service through short-term engagements with defined scopes aligned to industry standards and frameworks. The emphasis is on discovering vulnerabilities rather than evading detection or maintaining persistence.

Adversary simulation assesses the depth of potential impact by mimicking real-world threats over extended periods. The goal is to achieve predefined objectives using adversarial techniques that align to Advanced Persistent Threat (APT) tactics, techniques, and procedures (TTPs). Evasion techniques are deliberately deployed to test an organisation's detection and response capabilities, providing a realistic assessment of security posture against sophisticated threat actors.

These distinctions are particularly evident in social engineering engagements. Penetration testing typically employs whitelisted phishing campaigns designed to provide statistical analysis of user behaviour, measuring click rates, credential submissions, and similar metrics. Adversary simulation, however, uses realistic social engineering attacks that attempt to bypass security controls and evade detection mechanisms, assessing the full impact of successful compromise rather than simply measuring user susceptibility.

While many techniques discussed in this article could apply to social engineering during penetration tests, the focus here is on adversary-driven social engineering, where the objectives are gaining and maintaining access while avoiding detection.

During an engagement SilentGrid staff 'operator(s)' are attempting to meet an objective for a client organisation 'client'. In an adversary-driven social engineering engagement this can be as simplistic as attempting to gain access to the client environment by coercing a client staff member 'target'.

In order to do this, we need to do two things:

Convince the target to let us in.
Don't alert the target, or defenders within the client organisation, that we are trying to get in.

While the first point is what will allow us to meet our objective of gaining access to the environment, the second point is actually more important in an adversarial based engagement.

How do we convince someone to let us in?

In order to convince someone to let us in, we have to provide them with the answer to a question, that question being 'why?'. Why should I click this link? Why should I enter my username and password? Why should I reset the password for another user over the phone? Why should I allow a visitor into a secure space without a keycard?

To create the 'why' we need to develop and then deliver a 'pretext'. This pretext provides the target with the answer to the why. While some generic pretexts might be blunt and provide an obvious answer to the why, such as logging into a portal to receive notifications about a missing package or authenticating to a portal to claim money from a long lost relative, these pretexts are also exceptionally suspicious. As such we need to spend the time to develop a pretext which is actionable, believable, and does not seem suspicious to the target.

If a target receives an actionable but suspicious pretext, such as the examples above, it will be ignored or worse, reported by the target. If the pretext is both actionable and believable, then we have a chance of moving closer to our objective.

If the pretext is believable but not actionable, the target may be willing to perform the action we want but may not have the ability to do so. In some cases, this third option (believable but not actionable) can make a good starting point for an engagement.

If the target thinks the pretext is believable, but not actionable they may reach out for further details on how to action the request. This in turn has the potential to make them more amenable to performing the action, such as logging into a portal, than if the link was delivered to them in the first place. We can look at this type of pretext as staged, in which the first action we want the target to complete, is reaching out to the operator for further information regarding the pretext. While these staged pretexts can be effective, they take additional time to deliver.

Why 'don't alert the target' is more important

When we deliver our pretext to the target, either remotely through voice or text, or in person, we are going to receive one of three outcomes:

positive action, leading to access, the result of a believable and actionable pretext,
no action, not leading anywhere, the result of a believable but not actionable pretext,
negative action, leading to an alert, the result of a suspicious pretext.

This negative action, such as the target reporting a suspicious email to the Cyber Security team, not only results in us not getting the access we want, but also has the potential to burn the pretext, and associated resources, such as the URL and IP address used for email delivery, and the server used for credential theft. While this isn't an insurmountable hurdle to overcome, and one of the reasons we should start every engagement with more than one pretext and associated resources, it would certainly be better if no action was taken instead. If no action was taken, we can adjust our pretext or target and try again.

In some instances, a positive action may also result in negative action shortly after if the flow of the pretext is not well developed. An example of this is an email-based pretext where the target is required to authenticate to access a specific file. The pretext may be believable enough to cause them to attempt the action and actionable enough to provide them with a means to authenticate through a link to a malicious server. However, if the target successfully authenticates to the server but is not provided with the specific file matching the pretext or the file itself is not believable then negative action is likely to follow. While access to the environment might be gained, it could be short-lived. Once the target alerts defenders, access will be revoked, and the pretext and associated resources will be burnt.

Let's talk about the target list

As discussed above; in order to gain access and remain undetected we need to create an actionable and believable pretext that does not seem suspicious to the target. The best way to enable this to happen is to ensure that the maximum number of targets are available. The more targets, the more potential pretext opportunities there are.

If a limited scope of targets is specified by the client, it subsequently creates a limited number of pretext opportunities. During initial open source intelligence (OSINT) gathering an operator may identify that the client is using a specific Software as a Service (SaaS) platform. Additionally, the client may have an internet exposed login portal for this SaaS platform which directs client staff to perform single sign-on (SSO) with their identity provider. Using this information the operator could look to build a pretext which targets client staff who use this SaaS platform as part of their regular workflow. If, however, none of the targets on the provided target list use this SaaS platform then the pretext's believability is drastically reduced. Alternatively, the pretext may remain believable, but the target might not engage with the pretext as the SaaS platform is not relevant to them.

Another limiting factor which can occur is if the client specifies a target scope which only includes staff who are members of highly technical departments (such as Information Technology or Cyber Security) or high value staff (such as the C-Suite or executives). While it may be advantageous for an adversary to gain access to such user accounts, these users typically have higher training and levels of awareness when dealing with social engineering attacks such as external phishing. As such these targets can be difficult to compromise directly. While it’s not impossible to directly compromise members of the C-Suite or IT staff through social engineering, generally speaking, an easier pathway is to compromise a less secure account and then use this account for internal phishing or lateral movement into the cloud or on-premises environment. If there is the potential of a pathway to gain domain administrator or global administrator privileges using the account of a facilities attendant, why would an adversary bother trying to phish IT or the CEO first?

As such when conducting adversary-driven social engineering, applying a limited target list creates an unrealistic outcome. To put it simply, would a sophisticated attacker only target a limited set of specific staff, or would they target the staff who give them the greatest chance of gaining access and remaining undetected? To keep an engagement as realistic as possible, SilentGrid recommends not providing a list of in scope targets. Through OSINT and other external enumeration, we'll be able to identify and prioritise targets, and more importantly we'll be able to make sure viable pretexts fit. If there are requirements to exclude specific targets, SilentGrid recommends providing a list of targets who are explicitly not in scope.

You've got a delivery

When conducting a social engineering engagement, numerous delivery mechanisms can be utilised. The list below provides commonly used mechanisms:

Remote Text; such as email, SMS, instant messaging such as WhatsApp, social media messaging such as LinkedIn, website contact forms and ticketing systems.
Remote Voice; such as standard phone calls, corporate platform calls such as Zoom or Microsoft Teams, instant message platform calls such as WhatsApp.
In Person Delivery; which might include conversations with staff while onsite, accessing areas through tailgating or misdirection, and planted devices or posters with QR codes.

Similar to the unrealistic restrictions imposed by a limited target list, restricting the availability of certain delivery mechanisms also reduces the number of available pretexts as well as the realism of the engagement.

In some instances, a pretext may commence through an initial email, and then follow-up with an SMS message or phone call. Alternatively, a LinkedIn message might be sent to a target requesting their email for further communication and then moving the conversation there.

If delivery is scoped to only one mechanism, such as email, then if the target requests the operator call them for further information in order to action the pretext, then this avenue of potential access cannot progress. Similarly, if a scope is restricted to SMS delivery only, then the target list is restricted to only staff members whose mobile phone number is publicly available. In my experience, the best way to get mobile phone numbers for staff is from the email signature found in a reply, or out of office message.

It is understandable however that in-person activities are more difficult to manage logistically, both for SilentGrid and for the client. As such it may make practical sense to separate these physical activities into a different engagement or remove them from the scope.

Pretty fly for a phisher guy

A common misconception in adversary-driven social engineering is that success is measured by volume, sending hundreds of phishing emails or making dozens of calls to the IT helpdesks in order to demonstrate impact. This approach fundamentally misunderstands adversary behaviour and ultimately just increases the chance of detection. Sophisticated threat actors operate with patience, not volume.

If a single well-crafted email results in account compromise, every additional email is simply another detectable event. Each attempt generates logs, triggers security alerts, and increases the probability that someone will report suspicious activity. Similarly, every call made to a service desk using the same pretext is an opportunity for staff to compare notes, become suspicious, or escalate concerns to security teams.

We can think of it as the difference between fly fishing and trawling with a net. One targets specific catches with precision and minimal disturbance, while the other drags indiscriminately through everything, creating noise and alerting the entire area.

In regard to scoping, it's important to consider the above and not fall into the trap of setting arbitrary minimums on emails sent or calls made simply to demonstrate "impact" or justify the engagement. Instead, focus on quality over quantity, allowing the operator to target specific individuals most likely to provide access.

Additionally, where viable, engagement delivery should be conducted gradually over weeks rather than compressed into days. This measured pace allows targets to naturally engage with content without raising suspicion through sudden bursts of activity. This patient approach more accurately reflects real-world adversary behaviour, reduces avoidable detection events, and provides a more realistic assessment of organisational security posture against sophisticated threat actors. Importantly, this gradual approach also provides defenders with a realistic opportunity to detect, investigate, and contain an advanced persistent threat using their existing security processes and tools, rather than being overwhelmed by a sudden flood of suspicious activity that is unlikely to occur in a genuine attack.

Well, it's about time

In order to bypass effective technical controls, we need to create robust resources with a positive reputation. This ensures we have the best rate of delivery, as well as the best chance for our target to be able to action our pretext.

If we were to send an email to our target, and the email failed to reach the target's inbox, either because it's quarantined or marked as spam then it is highly unlikely any positive action will occur. Similarly, if an email does land in their inbox, but when they try to navigate to our credential capturing server through the provided URL, they are stopped by technical controls based on the URLs reputation, it's also unlikely any positive action will occur. Finally, if they do manage to access the URL, but their browser marks the page as a phishing site, due to prior service scanning of the server, then it's also unlikely any positive action will occur.

So how do we ensure our resources have a positive reputation, well it's actually quite simple, but it does take time.

Domain categorisation: All domains and URLs are constantly being inspected by security providers, such as Microsoft, Google, Cloudflare, Cisco, Palo Alto, Fortinet, and many others to attempt to determine if they are safe or malicious, and what category they best belong to, such as 'Business & Economy' or 'Real Estate'. If a domain is newly registered, generally within the first 30 days of registration, it is rightly categorised as a 'Newly Registered' domain. Similarly, if a domain has been registered for a period longer than 30 days but has only been displaying default registrar DNS records and content such as a parked website, then it will be categorised as 'Parked'.

Any domain marked as newly registered or parked has a significantly higher chance of being marked as spam or phishing by email providers as well as being blocked by URL filters and outbound controls.

So how do we resolve this, so our domains have positive categorisation and not negative ones? Once a domain is purchased, we apply DNS records which indicate use, we host safe web content which matches our pretext, and then we wait. We can attempt to speed up this process by requesting manual categorisation of our domain but there are some potential pitfalls with this. Firstly, a manual categorisation request of our domain or its content may mark it as 'Suspicious' by the provider, effectively burning that resource. Secondly while we may get a number of providers to positively categorise our domain, if we do not identify all of the providers which are used by the client it may still get blocked. Thirdly some providers do not allow categorisation requests or restrict them only to trusted paid customers.

Different providers perform categorisation for newly registered domains at different times, commonly around 30 days after purchase or re-registration. The following is taken from PaloAlto (https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PPImCAO);

"URLs are categorized as a 'newly-registered-domain' when the domain has been registered (Passive DNS is observed) within the last 32 days. We also consider re-registration as newly registered because it could come with ownership change.

After this period, Palo Alto Networks will crawl the URL to determine if it needs to be re-categorized. If this is unsuccessful, the category will be Insufficient-Content, Newly-Registered-Domain. The crawler then tries again at 3 day, 7 day, 2 week, and 30 day markers. By the last day if the crawler cannot determine the content, the URL will be categorized as "Insufficient-Content"."

A comprehensive list of domain categories from Cloudflare can be found here: https://developers.cloudflare.com/cloudflare-one/policies/gateway/domain-categories/

In summary, once a pretext is identified and a domain is purchased to support it, we need to wait a minimum of 32 days in order to use that domain for the engagement.

Sender reputation: Similar to domain categorisation discussed above, sender reputation directly impacts the likelihood an email will be delivered to a target's inbox. Sender reputation encompasses both domain and IP address metrics, with mailbox providers evaluating reputation based on recipient engagement, authentication protocols (SPF, DKIM, DMARC), complaint rates, and spam detection patterns. As an example, Microsoft requires at least 20 messages to be processed before establishing an initial reputation score. Attempting email delivery from newly registered accounts, without established reputation will likely result in immediate blocking.

Further details from Microsoft on Sender reputation and the Protocol Analysis agent in Exchange Server can be found here: https://learn.microsoft.com/en-us/exchange/antispam-and-antimalware/antispam-protection/sender-reputation

As such, once an email account is configured to match the pretext, email warmup should occur over time to slowly build the reputation of the sender. As 32 days need to pass in order to remove the newly registered domain categorisation, this period can be used to also build sender reputation.

TLS certificates: TLS certificates serve a dual purpose in social engineering campaigns. They provide legitimate encryption for data in transit, as well as establishing perceived legitimacy and trust with targets. When a user visits a website secured with a valid TLS certificate, browsers display visual trust indicators such as padlock icons, 'https://' prefixes, and 'Secure' labels. These indicators create a sense of assurance that the site is trustworthy helping to convince targets to interact with phishing pages or enter credentials.

However, within minutes of certificate issuance, automated scanners from security vendors such as Google Safe Browsing, Microsoft Defender SmartScreen, VirusTotal, and various other threat intelligence platforms will visit the domain. These bots will attempt to fingerprint the service, analyse content, and assess whether the site is safe. If malicious content is detected during this period, such as credential harvesting forms, adversary in the middle proxies, or suspicious redirects, the domain will be rapidly flagged and blocklisted across multiple security platforms.

To mitigate this, after TLS certificate registration, we serve only benign, legitimate-appearing content that aligns with our pretext, and then, like with domain categorisation, we wait. A minimum waiting period of 72 hours should be observed before transitioning the infrastructure to serve malicious content as part of the active engagement. This staged deployment significantly reduces the likelihood of our infrastructure being detected as malicious.

We’re in!

Prior to the commencement of an adversary-driven social engineering campaign, it is essential to define post-compromise actions and establish clear rules of engagement. Once credentials are compromised or initial access is gained, operators must know exactly what actions are authorised and what boundaries exist.

This may include whether to proceed with persistence mechanisms, to investigate lateral movement, data exfiltration, and privilege escalation opportunities, or to simply document the compromise and halt. Without pre-defined post-compromise actions, operators risk either stopping too early and failing to demonstrate the true impact of the compromise or proceeding too far and causing unintended business disruption through potential internal detections.

In addition, some post-compromise actions are time-sensitive and may no longer be viable if not conducted immediately at the time of compromise. An example of this is applying Shadow MFA on an account after initial compromise.

Shadow MFA is an account persistence technique where an attacker who has compromised a user account registers an additional unauthorised multifactor authentication (MFA) method under their control, such as a phone number for SMS/voice authentication or an authenticator application on their own device. This malicious MFA enrolment can generally only be performed while the user's authentication token maintains an elevated security context, typically within one hour of the initial authentication or MFA challenge. This technique is particularly dangerous because it provides long-term persistence which can evade detection by appearing as legitimate MFA activity, allowing the attacker to repeatedly access the account without relying on stolen cookies or tokens that may expire.

Creating an attack chain

While some organisations might be interested in performing a segregated adversary-driven social engineering engagement, there is a lot of value which can be gained when combining this engagement with a perimeter assessment and assumed breach engagement.

When performing a perimeter assessment for an organisation we look to identify vulnerabilities and avenues of entry across an organisation's internet facing attack surface. This may include insecure websites, leaked or weak credentials, information disclosures, exposed administrative and other login portals, as well as a range of other things. Regardless of whether access is gained as part of such an engagement, all of the information collected feeds directly into pretext generation for a social engineering campaign. SaaS products would be recorded, staff details collected, and identity platforms identified. In a social engineering engagement, we would use OSINT to target this information, in a perimeter assessment we would collect it all naturally.

Something that is often seen during assumed breach engagements is the creation of sterile starting points for attackers. Whether this is a new user account, or a cloned server, these resources lack the 'lived in' nature of the real thing. While operators are generally able to work around this, it can underrepresent the impact of what would occur should a real resource be compromised. If during our social engineering engagement, we successfully meet our objective of gaining access, then we can commence our assumed breach engagement no longer needing to 'assume'. The account used will have all the elements of being lived in, because it is.

Chaining these engagements together also allows an organisation to put into practice their defences, both technical and procedural, and assess their effectiveness across a full attack chain.

Key Takeaways

Adversary-driven social engineering fundamentally differs from traditional penetration testing by prioritising quality over quantity, patience over speed, and realism over metrics. Successful engagements require:

Preparation time: Minimum 32 days for domain reputation building, with additional time for sender reputation and TLS certificate staging (72 hours minimum)
Flexible targeting: Avoid restrictive target lists; allowing operators to identify and pursue the most realistic attack paths
Unrestricted delivery mechanisms: Enable multiple delivery methods (email, SMS, voice, in-person) to reflect real adversary behaviour
Precision over volume: One successful compromise is better than dozens of detected attempts
Pre-defined post-compromise actions: Establish clear rules of engagement before the campaign begins, including time-sensitive actions like Shadow MFA
Gradual execution: Conduct engagements over weeks, not days, to provide defenders realistic opportunities for detection
Integration with broader assessments: Combine with technical perimeter assessment and (assumed) environment breach engagements for comprehensive attack chain testing

The most effective adversary-driven social engineering engagements mirror sophisticated threat actor behaviour: they are patient, precise, and designed to evade detection while demonstrating realistic organisational risk. Success is measured not by the number of emails sent, but by the ability to gain and maintain access while testing an organisation's detection and response capabilities under realistic conditions.

Adversary-driven Social Engineering

Ben Standen

How do we convince someone to let us in?

Why 'don't alert the target' is more important

Let's talk about the target list

You've got a delivery

Pretty fly for a phisher guy

Well, it's about time

We’re in!

Creating an attack chain

Key Takeaways

Read more

Understanding Post-Quantum Security: What Business Leaders Need to Know

BSides Canberra 2025: T-Shirts, Time Travel, and Why Mainframes Are Still Terrifying

Mainframes: Legacy and Longevity

Use Device Code FIRESALEZ at Checkout