Custom detection

The custom detection feature is an advanced feature that allows users to define their own rules for identifying specific patterns within files. This capability enables users to support their own file types for detection quickly, without needing to wait for official support from the FileType engine.

Enable custom detection

This feature is disabled by default. To enable this feature:

  • At Inventory > Modules > Utilities > FileType, Tick Enable custom detection
  • At Inventory > Modules > Utilities > FileType, section Enable custom detection, specify paths to rule files and/or rule directories that contain JSON rule files.

When there are updates on the rule files or the rule directories, the engine needs to be restarted in order for the rules to take effective.

When new items of the configuration are added, the rules are loaded automatically along with the changes insides the existing files or directories if available.

JSON custom rules

Rule definitions

Info of file types detected with custom rules and the rules are defined in JSON format with fields described as in the table below. The rule file must have .json extension.

File type Info

FieldMandatoryMeaning
rule_typeRequiredCan be either xml-based or zip-based or binary.
file_typeRequiredFile type description to be used to output.
file_type_idRequiredFile type ID.
mime_typeOptionalMime type to be used to output. Default value: application/octet-stream.
encryptedOptionalEncrypt property to be used to output. Default value: false.
groupOptionalGroup ID to be used to output. Default value: O. See the list of group IDs below.
extensionsOptionalExtension(s) for the file format. This value will be used to check mismatching. Default value: empty.

Binary detection rules

FieldMandatoryMeaning
min_scoreOptionalMinimum confidence score for this file type. Default value: 0.0.
priorityOptionalPriority of the rule. Default value: 1.
variablesOptionalInput variables array. Default value: empty.
    [variable].nameRequiredName of the variable.
    [variable].typeOptionalType of the variable. Currently, only "built-in" variables are supported.
    [variable].data_typeOptionalData type of the variable. This can be i8, i16, i32, i64, u8, u16, u32, u64, float, double or a string.
    [variable].lengthOptionalLength of the variable. Applicable for string-type variable. Default value: 0.
    [variable].endiannessOptionalEndianness of the variable. This can be host, little_endian, big_endian. Default value: little_endian.
    [variable].offsetOptional(Similar to offset described in Binary detection rule).
rulesRequiredAn array contains binary detection rules.
Binary detection rule
[rule].scoreOptionalScore of the rule. Default value: 0.0.
[rule].descriptionOptionalThe description of the rule. Default value: empty.
[rule].matchRequiredObject contains the detection match.
    offsetOptionalOffset to check for the match. Can be either integer or object. Applicable for type exact, search, regex, and oneof.
        baseOptional

(For object-type offset) Base for the offset, can be one of the following values:

  • beginning (Default): Beginning of file.
  • end_of_data: End of file.
  • variable: the offset is an input variable.
        variableOptional(For object-type offset) Variable for the offset. Required for variable-type base.
        relative_offsetOptional(For object-type offset) Relative value for the offset. Default value: 0.
    typeRequiredCan only be one of the following values: exact, compare, search, regex, oneof.
For exact-type match
    offsetOptional(mentioned above)
For compare-type match
    operatorRequiredCan only be one of the following values: equal, greater, greater_or_equal, less_than, less_than_or_equal.
    valueRequiredThe value to compare. This can be integer, string, or object.
        variableOptionalThe input variable for the value. It's required for object-type value.
For search-type match
    offsetOptional(mentioned above)
    search_rangeOptionalSearching range. 0 is for searching the whole file. Default value 0.
    occurrencesOptionalNumber of occurrences for the match. Default value 1. This can be either integer or object.
        minOptional(For object-type occurrences) Minimum number of occurrences required for the match. Default value 0.
        maxOptional(For object-type occurrences) Maximum number of occurrences required for the match. Default value 0.
    algorithmOptionalSearching strategy. Can be either linear or KMP. Default value: linear.
For regex-type match
    offsetOptional(mentioned above)
    rangeOptionalSearching range. 0 is for searching the whole file. Default value 0.
For oneof-type match
    offsetOptional(mentioned above)
[rule].patternRequiredObject contains the detection pattern to be matched.
    typeRequiredCan only be one of the following values: hex, text, encoded, variable.
For hex-type pattern
    dataRequiredHex data. This can be a hex value or an array of hex values.
    maskOptionalMask bytes.
For variable-type pattern
    nameRequiredName of variable.
For text-type pattern
    dataRequiredThe text data.
For encoded-type patternNot supported for now.

Zip-based detection rules

FieldMandatoryMeaning
entry_rulesRequiredAn array contains all rules for entry checking.
    [entry_rule].patternRequiredEntry name of the rule.
    [entry_rule].typeRequiredCan only be exact for now.
    [entry_rule].scoreOptionalScore for the entry. Default value: 0.0.
    [entry_rule].dataOptionalRule to detect binary data of the entry (refer to section Binary detection rules).

Xml-based detection rules

FieldMandatoryMeaning
root_nodeRequiredObject contains root node information.
    nameRequiredName of the root node.
    scoreOptionalRoot node contribution score. Default value: 0.0.
    attributesOptionalAn array contains attributes.
        [attribute].nameRequiredAttribute name.
        [attribute].valueRequiredAttribute value.
    attributes_contributionOptionalAttributes contribution score. Default value: 0.5.
tree_nodesOptionalAn array contains tree nodes information.
    scoreOptionalTree nodes contribution score. Default value: 0.0.
    nodesOptionalAn array contains tree nodes.
        [node].nameRequiredNode name.
        [node].levelRequiredNode level.
        [node].parentRequiredNode parent.

Group ID and name

GroupGroupGroup
A: Archive FilesG: Image FilesT: Text Files
AP: Application FilesI: Disk Image FilesZ: Email Files
D: Office DocumentsM: Media FilesO: Other
D_ENC: Encrypted DocumentsOPENSSL_ENC: OpenSSL Encrypted Files
E: Executable FilesP: Adobe Files

Example rules

Below are some JSON example rules.

Binary - Exact Match
Binary - OneOf Match
Zip-based
XML-based
Copy

XML custom rules (deprecated)

Rule definitions

Info of file types detected with custom rules and the rules are defined in XML format with fields described as in the table below. The rule file must have .xml extension.

FieldMandatoryMeaning
File type info
descriptionRequiredFile type description to be used to output.
idRequiredFile type ID.
mimeOptionalMime type to be used to output. Default value: application/octet-stream.
groupOptionalGroup ID to be used to output. Default value: O. See the list of group IDs below.
extensionOptionalExtension(s) for the file format. This value will be used to check mismatching. Default value: empty.
scoreOptional

Confidence score for the custom file type. Value range [0, 1].

Default value: 0.25.

Patterns for detection
FrontBlockRequiredDefine patterns at specific offsets
FrontBlock.PatternRequiredDefine offset (stored in Pos) and hex pattern to be compared (stored in Bytes).
GlobalStringsOptionalDefine patterns at random offsets.
GlobalStrings.StringOptionalDefine string pattern to be matched.

Group ID and name

GroupGroupGroup
A: Archive FilesG: Image FilesT: Text Files
AP: Application FilesI: Disk Image FilesZ: Email Files
D: Office DocumentsM: Media FilesO: Other
D_ENC: Encrypted DocumentsOPENSSL_ENC: OpenSSL Encrypted Files
E: Executable FilesP: Adobe Files

The current use case is to turn a unknown (DATA) or not surely (non-DATA with score < 1.0) (detected by native rules of the engine) file type into a user-custom one with higher score.

There can be cases in which a file matches both a custom and a built-in rule. In order to prioritize the detection result from the custom rule, the custom rule should be defined with a high confidence score, e.g., 1.1.

The detection "score" can be found in the JSON scan result: filetype_info.file_info.likely_type_ids.score

Example rules

Below are some XML example rules.

Rule 1 - Patterb at offset 0
Rule 2 - Pattern at offset 16
Rule 3 - Have global string
Rule 4 - Multiple global strings
Copy
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard