reverted best ahq practices page

attempted rebase was complicated by hundreds of commits applied after the removed commits
2025-07-05 20:23:38 +00:00 · 2020-10-15 15:33:46 -04:00
parent b368d3c07f
commit 281a3d2c25
1 changed files with 27 additions and 207 deletions
--- a/windows/security/threat-protection/microsoft-defender-atp/advanced-hunting-best-practices.md
+++ b/windows/security/threat-protection/microsoft-defender-atp/advanced-hunting-best-practices.md
@ -25,204 +25,54 @@ ms.topic: article

 - [Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP)](https://go.microsoft.com/fwlink/p/?linkid=2069559)

-> Want to experience Microsoft Defender ATP? [Sign up for a free trial.](https://www.microsoft.com/microsoft-365/windows/microsoft-defender-atp?ocid=docs-wdatp-advancedhuntingref-abovefoldlink)
+>Want to experience Microsoft Defender ATP? [Sign up for a free trial.](https://www.microsoft.com/microsoft-365/windows/microsoft-defender-atp?ocid=docs-wdatp-bestpractices-abovefoldlink)

-Apply these recommendations to get results faster and avoid timeouts while running complex queries. For more guidance on improving query performance, read [Kusto query best practices](https://docs.microsoft.com/azure/kusto/query/best-practices).
+## Optimize query performance

-## General guidance
+Apply these recommendations to get results faster and avoid timeouts while running complex queries.

- **Size new queries**—If you suspect that a query will return a large result set, assess it first using the [count operator](https://docs.microsoft.com/azure/data-explorer/kusto/query/countoperator). Use [limit](https://docs.microsoft.com/azure/data-explorer/kusto/query/limitoperator) or its synonym `take` to avoid large result sets.
+- When trying new queries, always use `limit` to avoid extremely large result sets. You can also initially assess the size of the result set using `count`.
+- Use time filters first. Ideally, limit your queries to seven days.
+- Put filters that are expected to remove most of the data in the beginning of the query, right after the time filter.
+- Use the `has` operator over `contains` when looking for full tokens.
+- Look in a specific column rather than running full text searches across all columns.
+- When joining tables, specify the table with fewer rows first.
+- `project` only the necessary columns from tables you've joined.

- **Apply filters early**—Apply time filters and other filters to reduce the data set, especially before using transformation and parsing functions, such as [substring()](https://docs.microsoft.com/azure/data-explorer/kusto/query/substringfunction), [replace()](https://docs.microsoft.com/azure/data-explorer/kusto/query/replacefunction), [trim()](https://docs.microsoft.com/azure/data-explorer/kusto/query/trimfunction), [toupper()](https://docs.microsoft.com/azure/data-explorer/kusto/query/toupperfunction), or [parse_json()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parsejsonfunction). In the example below, the parsing function [extractjson()](https://docs.microsoft.com/azure/data-explorer/kusto/query/extractjsonfunction) is used after filtering operators have reduced the number of records.
+>[!TIP]
+>For more guidance on improving query performance, read [Kusto query best practices](https://docs.microsoft.com/azure/kusto/query/best-practices).

-    ```kusto
-    DeviceEvents
-    | where Timestamp > ago(1d)
-    | where ActionType == "UsbDriveMount"
-    | where DeviceName == "user-desktop.domain.com"
-    | extend DriveLetter = extractjson("$.DriveLetter", AdditionalFields)
-     ```
+## Query tips and pitfalls

- ***Has* beats *contains*** —To avoid searching substrings within words unnecessarily, use the `has` operator instead of `contains`. [Learn about string operators](https://docs.microsoft.com/azure/data-explorer/kusto/query/datatypes-string-operators)
+### Queries with process IDs

- **Look in specific columns**—Look in a specific column rather than running full text searches across all columns. Don't use `*` to check all columns.
-
- **Case-sensitive for speed**—Case-sensitive searches are more specific and generally more performant. Names of case-sensitive [string operators](https://docs.microsoft.com/azure/data-explorer/kusto/query/datatypes-string-operators), such as `has_cs` and `contains_cs`, generally end with `_cs`. You can also use the case-sensitive equals operator `==` instead of `~=`.
-
- **Parse, don't extract**—Whenever possible, use the [parse operator](https://docs.microsoft.com/azure/data-explorer/kusto/query/parseoperator) or a parsing function like [parse_json()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parsejsonfunction). Avoid the `matches regex` string operator or the [extract() function](https://docs.microsoft.com/azure/data-explorer/kusto/query/extractfunction), both of which use regular expression. Reserve the use of regular expression for more complex scenarios. [Read more about parsing functions](#parse-strings)
-
- **Filter tables not expressions**—Don't filter on a calculated column if you can filter on a table column.
-
- **No three-character terms**—Avoid comparing or filtering using terms with three characters or fewer. These terms are not indexed and matching them will require more resources.
-
- **Project selectively**—Make your results easier to understand by projecting only the columns you need. Projecting specific columns prior to running [join](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator) or similar operations also helps improve performance.
-
-## Optimize the `join` operator
-
-The [join operator](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator) merges rows from two tables by matching values in specified columns. Apply these tips to optimize queries that use this operator.
-
- **Smaller table to your left**—The `join` operator matches records in the table on the left side of your join statement to records on the right. By having the smaller table on the left, fewer records will need to be matched, thus speeding up the query.
-
-    In the table below, we reduce the left table `DeviceLogonEvents` to cover only three specific devices before joining it with `DeviceNetworkEvents` by device IDs.
-
-    ```kusto
-    DeviceLogonEvents
-    | where DeviceName in ("device-1.domain.com", "device-2.domain.com", "device-3.domain.com")
-    | where ActionType == "LogonFailed"
-    | join
-        (DeviceNetworkEvents
-        | where Protocol == "Kerberos"
-        | where ActionType == "LogonFailed")
-    on DeviceId
-    ```
-
- **Use the inner-join flavor**—The default [join flavor](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator#join-flavors) or the [innerunique-join](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator?pivots=azuredataexplorer#innerunique-join-flavor) deduplicates rows in the left table by the join key before returning a row for each match to the right table. If the left table has multiple rows with the same value for the `join` key, those rows will be deduplicated to leave a single random row for each unique value.
-
-    This default behavior can leave out important information from the left table that can provide useful insight. For example, the query below will only show one email containing a particular attachment, even if that same attachment was sent using multiple emails messages:
-
-    ```kusto
-    EmailAttachmentInfo
-    | where Timestamp > ago(1h)
-    | where Subject == "Document Attachment" and FileName == "Document.pdf"
-    | join (DeviceFileEvents | where Timestamp > ago(1h)) on SHA256
-    ```
-
-    To address this limitation, we apply the [inner-join](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator?pivots=azuredataexplorer#inner-join-flavor) flavor by specifying `kind=inner` to show all rows in the left table with matching values in the right:
-
-    ```kusto
-    EmailAttachmentInfo
-    | where Timestamp > ago(1h)
-    | where Subject == "Document Attachment" and FileName == "Document.pdf"
-    | join kind=inner (DeviceFileEvents | where Timestamp > ago(1h)) on SHA256
-    ```
-
- **Join records from a time window**—When investigating security events, analysts look for related events that occur around the same time period. Applying the same approach when using `join` also benefits performance by reducing the number of records to check.
-    
-    The query below checks for logon events within 30 minutes of a credential access alert being raised:
-
-    ```kusto
-    DeviceAlertEvents
-    | where Timestamp > ago(7d)
-    | where Severity == "High"
-    | where Category == "CredentialAccess"
-    | project AlertRaised = Timestamp, DeviceName, AlertId, Title, AttackTechniques
-    | join (
-    DeviceLogonEvents 
-    | where Timestamp > ago(7d)
-    | project LogonTime = Timestamp, DeviceName, AccountName
-    ) on DeviceName 
-    | where (LogonTime - AlertRaised) between (0min .. 30min)
-    ```
-
- **Apply time filters on both sides**—Even if you're not investigating a specific time window, applying time filters on both the left and right tables can reduce the number of records to check and improve `join` performance. The query below applies `Timestamp > ago(1h)` to both tables so that it joins only records from the past hour:
-
-    ```kusto
-    DeviceAlertEvents
-    | where Timestamp > ago(1h)
-    | where Severity == "High"
-    | join (DeviceFileEvents 
-    | where Timestamp > ago(1h)
-    | where ActionType == "FileCreated"
-    ) on SHA1
-    ```
-
- **Use hints for performance**—Use hints with the `join` operator to instruct the backend to distribute load when running resource-intensive operations. [Learn more about join hints](https://docs.microsoft.com/azure/data-explorer/kusto/query/joinoperator#join-hints)
-
-    For example, the **[shuffle hint](https://docs.microsoft.com/azure/data-explorer/kusto/query/shufflequery)** helps improve query performance when joining tables using a key with high cardinality—a key with many unique values—such as the `AccountObjectId` in the query below:
-
-    ```kusto
-    IdentityInfo
-    | where JobTitle == "CONSULTANT"
-    | join hint.shufflekey = AccountObjectId
-    (IdentityDirectoryEvents
-        | where Application == "Active Directory"
-        | where ActionType == "Private data retrieval")
-    on AccountObjectId
-    ```
-
-    The **[broadcast hint](https://docs.microsoft.com/azure/data-explorer/kusto/query/broadcastjoin)** helps when the left table is small (up to 100,000 records) and the right table is extremely large. For example, the query below is trying to join a few emails that have specific subjects with _all_ messages containing links in the `EmailUrlInfo` table:
-
-    ```kusto
-    EmailEvents
-    | where Subject in ("Warning: Update your credentials now", "Action required: Update your credentials now")
-    | join hint.strategy = broadcast EmailUrlInfo on NetworkMessageId 
-    ```
-
-## Optimize the `summarize` operator
-
-The [summarize operator](https://docs.microsoft.com/azure/data-explorer/kusto/query/summarizeoperator) aggregates the contents of a table. Apply these tips to optimize queries that use this operator.
-
- **Find distinct values**—In general, use `summarize` to find distinct values that can be repetitive. It can be unnecessary to use it to aggregate columns that don't have repetitive values.
-
-    While a single email can be part of multiple events, the example below is _not_ an efficient use of `summarize` because a network message ID for an individual email always comes with a unique sender address.
-
-    ```kusto
-    EmailEvents  
-    | where Timestamp > ago(1h)
-    | summarize by NetworkMessageId, SenderFromAddress
-    ```
-
-    The `summarize` operator can be easily replaced with `project`, yielding potentially the same results while consuming fewer resources:
-
-    ```kusto
-    EmailEvents  
-    | where Timestamp > ago(1h)
-    | project NetworkMessageId, SenderFromAddress
-    ```
-
-    The following example is a more efficient use of `summarize` because there can be multiple distinct instances of a sender address sending email to the same recipient address. Such combinations are less distinct and are likely to have duplicates.
-
-    ```kusto
-    EmailEvents  
-    | where Timestamp > ago(1h)
-    | summarize by SenderFromAddress, RecipientEmailAddress
-    ```
-
- **Shuffle the query**—While `summarize` is best used in columns with repetitive values, the same columns can also have _high cardinality_ or large numbers of unique values. Like the `join` operator, you can also apply the [shuffle hint](https://docs.microsoft.com/azure/data-explorer/kusto/query/shufflequery) with `summarize` to distribute processing load and potentially improve performance when operating on columns with high cardinality.
-
-    The query below uses `summarize` to count distinct recipient email address, which can run in the hundreds of thousands in large organizations. To improve performance, it incorporates `hint.shufflekey`:
-
-    ```kusto
-    EmailEvents  
-    | where Timestamp > ago(1h)
-    | summarize hint.shufflekey = RecipientEmailAddress count() by Subject, RecipientEmailAddress
-    ```
-
-## Query scenarios
-
-### Identify unique processes with process IDs
-
-Process IDs (PIDs) are recycled in Windows and reused for new processes. On their own, they can't serve as unique identifiers for specific processes.
-
-To get a unique identifier for a process on a specific machine, use the process ID together with the process creation time. When you join or summarize data around processes, include columns for the machine identifier (either `DeviceId` or `DeviceName`), the process ID (`ProcessId` or `InitiatingProcessId`), and the process creation time (`ProcessCreationTime` or `InitiatingProcessCreationTime`)
+Process IDs (PIDs) are recycled in Windows and reused for new processes. On their own, they can't serve as unique identifiers for specific processes. To get a unique identifier for a process on a specific device, use the process ID together with the process creation time. When you join or summarize data around processes, include columns for the device identifier (either `DeviceId` or `DeviceName`), the process ID (`ProcessId` or `InitiatingProcessId`), and the process creation time (`ProcessCreationTime` or `InitiatingProcessCreationTime`).

 The following example query finds processes that access more than 10 IP addresses over port 445 (SMB), possibly scanning for file shares.

-Example query:
-
 ```kusto
 DeviceNetworkEvents
 | where RemotePort == 445 and Timestamp > ago(12h) and InitiatingProcessId !in (0, 4)
-| summarize RemoteIPCount=dcount(RemoteIP) by DeviceName, InitiatingProcessId
-InitiatingProcessCreationTime, InitiatingProcessFileName
+| summarize RemoteIPCount=dcount(RemoteIP) by DeviceName, InitiatingProcessId, InitiatingProcessCreationTime, InitiatingProcessFileName
 | where RemoteIPCount > 10
 ```

 The query summarizes by both `InitiatingProcessId` and `InitiatingProcessCreationTime` so that it looks at a single process, without mixing multiple processes with the same process ID.

-### Query command lines
+### Queries with command lines

-There are numerous ways to construct a command line to accomplish a task. For example, an attacker could reference an image file without a path, without a file extension, using environment variables, or with quotes. The attacker could also change the order of parameters or add multiple quotes and spaces.
+Command lines can vary. When applicable, filter on file names and do fuzzy matching.

-To create more durable queries around command lines, apply the following practices:
+There are numerous ways to construct a command line to accomplish a task. For example, an attacker could reference an image file with or without a path, without a file extension, using environment variables, or with quotes. In addition, the attacker could also change the order of parameters or add multiple quotes and spaces.

- Identify the known processes (such as *net.exe* or *psexec.exe*) by matching on the file name fields, instead of filtering on the command-line itself.
- Parse command-line sections using the [parse_command_line() function](https://docs.microsoft.com/azure/data-explorer/kusto/query/parse-command-line) 
+To create more durable queries using command lines, apply the following practices:
+
+- Identify the known processes (such as *net.exe* or *psexec.exe*) by matching on the filename fields, instead of filtering on the command-line field.
 - When querying for command-line arguments, don't look for an exact match on multiple unrelated arguments in a certain order. Instead, use regular expressions or use multiple separate contains operators.
- Use case insensitive matches. For example, use `=~`, `in~`, and `contains` instead of `==`, `in`, and `contains_cs`.
- To mitigate command-line obfuscation techniques, consider removing quotes, replacing commas with spaces, and replacing multiple consecutive spaces with a single space. There are more complex obfuscation techniques that require other approaches, but these tweaks can help address common ones.
+- Use case insensitive matches. For example, use `=~`, `in~`, and `contains` instead of `==`, `in` and `contains_cs`
+- To mitigate DOS command-line obfuscation techniques, consider removing quotes, replacing commas with spaces, and replacing multiple consecutive spaces with a single space. Note that there are more complex DOS obfuscation techniques that require other approaches, but these can help address the most common ones.

-The following examples show various ways to construct a query that looks for the file *net.exe* to stop the firewall service "MpsSvc":
+The following examples show various ways to construct a query that looks for the file *net.exe* to stop the Windows Defender Firewall service:

 ```kusto
 // Non-durable query - do not use
@ -230,7 +80,7 @@ DeviceProcessEvents
 | where ProcessCommandLine == "net stop MpsSvc"
 | limit 10

-// Better query - filters on file name, does case-insensitive matches
+// Better query - filters on filename, does case-insensitive matches
 DeviceProcessEvents
 | where Timestamp > ago(7d) and FileName in~ ("net.exe", "net1.exe") and ProcessCommandLine contains "stop" and ProcessCommandLine contains "MpsSvc" 

@ -241,37 +91,7 @@ DeviceProcessEvents
 | where CanonicalCommandLine contains "stop" and CanonicalCommandLine contains "MpsSvc" 
 ```

-### Ingest data from external sources
-
-To incorporate long lists or large tables into your query, use the [externaldata operator](https://docs.microsoft.com/azure/data-explorer/kusto/query/externaldata-operator) to ingest data from a specified URI. You can get data from files in TXT, CSV, JSON, or [other formats](https://docs.microsoft.com/azure/data-explorer/ingestion-supported-formats). The example below shows how you can utilize the extensive list of malware SHA-256  hashes provided by MalwareBazaar (abuse.ch) to check attachments on emails:
-
-```kusto
-let abuse_sha256 = (externaldata(sha256_hash: string )
-[@"https://bazaar.abuse.ch/export/txt/sha256/recent/"]
-with (format="txt"))
-| where sha256_hash !startswith "#"
-| project sha256_hash;
-abuse_sha256
-| join (EmailAttachmentInfo 
-| where Timestamp > ago(1d) 
-) on $left.sha256_hash == $right.SHA256
-| project Timestamp,SenderFromAddress,RecipientEmailAddress,FileName,FileType,
-SHA256,MalwareFilterVerdict,MalwareDetectionMethod
-```
-
-### Parse strings
-
-There are various functions you can use to efficiently handle strings that need parsing or conversion.
-
-| String | Function | Usage example |
-|--|--|--|
-| Command-lines | [parse_command_line()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parse-command-line) | Extract the command and all arguments. | 
-| Paths | [parse_path()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parsepathfunction) | Extract the sections of a file or folder path. |
-| Version numbers | [parse_version()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parse-versionfunction) | Deconstruct a version number with up to four sections and up to eight characters per section. Use the parsed data to compare version age. |
-| IPv4 addresses | [parse_ipv4()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parse-ipv4function) | Convert an IPv4 address to a long integer. To compare IPv4 addresses without converting them, use [ipv4_compare()](https://docs.microsoft.com/azure/data-explorer/kusto/query/ipv4-comparefunction). |
-| IPv6 addresses | [parse_ipv6()](https://docs.microsoft.com/azure/data-explorer/kusto/query/parse-ipv6function)  | Convert an IPv4 or IPv6 address to the canonical IPv6 notation. To compare IPv6 addresses, use [ipv6_compare()](https://docs.microsoft.com/azure/data-explorer/kusto/query/ipv6-comparefunction). |
-
-To learn about all supported parsing functions, [read about Kusto string functions](https://docs.microsoft.com/azure/data-explorer/kusto/query/scalarfunctions#string-functions).
+> Want to experience Microsoft Defender ATP? [Sign up for a free trial.](https://www.microsoft.com/microsoft-365/windows/microsoft-defender-atp?ocid=docs-wdatp-bestpractices-belowfoldlink)

 ## Related topics