Developer Guide


This is the API documentation for MetaDefender Distributed Cluster Public API. If you would like to evaluate or have any questions about this documentation, please contact us via our Contact Us form.


How to Interact with MetaDefender Distributed Cluster API Gateway using REST API

MetaDefender Distributed Cluster API Gateway is used to submit files for analysis, retrieve scan results, manage file processing, download processed files, and manage file batches. OPSWAT recommends using the JSON-based REST API. The available methods are documented below.

Note: MetaDefender Distributed Cluster API doesn't support chunk upload, however is recommended to stream the files to MetaDefender Distributed Cluster API Gateway as part of the upload process.


File Analysis Process

MetaDefender Distributed Cluster is a system with multiple components that work together to utilize the power of multiple MetaDefender Core instances. The system is designed to handle large volumes of files and provide high throughput for file analysis. The system can be deployed in a distributed manner, allowing for horizontal scaling and load balancing across multiple MD Core instances.

Below is a brief description of the API integration flow:

  1. Upload a file for analysis to MetaDefender Distributed Cluster API Gateway (POST /file), which returns the data_id: File Analysis.

  2. The following method can be used to retrieve the analysis report:

    • Polling: Fetch the result with previously received data_id (GET /file/{data_id} resource) until scan result belonging to data_id doesn't reach the 100 percent progress_percentage: (Fetch analysis result)

    Note: Too many data_id requests can reduce performance. It is enough to just check every few hundred milliseconds.

  3. Retrieve the analysis results anytime after the analysis is completed with hash for files (md5, sha1, sha256, sha512) by calling Fetch analysis result by hash.

    • The hash can be found in the scan results
  4. Retrieve processed file (sanitized, redacted, watermarked, etc.) after the analysis is complete.

    Note: Based on the configured retention policy, the files might be available for retrieval at a later time.


OPSWAT provides some sample codes on GitHub to make it easier to understand how the MetaDefender REST API works.

Server
http://localhost:8899
Server Variables
apiKey apikey

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Fields
KeyIn
apikeyHeader

Auth

Authentication APIs

User authentication is done via username & password.

Login

Initiate a new session. Required for using protected REST APIs.

Auth
Request Body
objectobject
userstring

Username

passwordstring

User's password

POST /login
Copy
Responses
200

OK

objectobject
oms-csrf-tokenstring

The randomly generated token used to prevent CSRF attacks

session_idstring

The apikey used to make API calls which requires authentication

403

Invalid credentials

500

Unexpected event on server

Response
Copy

Logout

Destroy session for not using protected REST APIs.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

POST /logout
Copy
Responses
200

OK

objectobject
responsestring
400

Bad Request.

403

Invalid user information.

500

Unexpected event on server

Response
Copy

Analysis

File analysis APIs

Submit each file to MetaDefender Distributed Cluster API Gateway individually or group them in batches. Each file submission will return a data_id which will be the unique identifier used to retrieve the analysis results.

Note: MetaDefender API doesn't support chunk upload. You shouldn't load the file in memory, is recommended to stream the files to MetaDefender Distributed Cluster API Gateway as part of the upload process.

Analyze File (Asynchronous mode)

Scanning a file using a specified workflow. Scan is done asynchronously and each scan request is tracked by data id of which result can be retrieved by API Fetch Scan Result.

Note: Chunked transfer encoding (applying header Transfer-Encoding: Chunked) is not supported on /file API.

Auth
Headers
Request Body
filefile
POST /file
Copy
Responses
200

Successful file submission

objectobject
data_idstring

Unique submission identifier. Use this value to reference the submission.

403

Invalid user information or Not Allowed

411

Content-Length header is missing from the request.

422

Body input is empty.

500

Unexpected event on server

503

Server is too busy. Try again later.

Response
Copy

Fetch Analysis Result

Retrieve scan results.

Scan is done asynchronously and each scan request is tracked by a data ID.

Initiating file scans and retrieving the results need to be done using two separate API calls. This request needs to be made multiple times until the scan is complete. Scan completion can be traced using scan_results.progress_percentage value from the response.

Note: The REST API also supports pagination for archive file result. A completed response description with archive detection:

  • extracted_files: information about extracted files
    • files_extracted_count: the number of extracted files
    • files_in_archive: array of files in archive
      • detected_by: number of engines reported threat
      • scanned_with: number of engines used for scanning the file
    • first_index: it tells that from which file (index of the file, 0 is the first) the result JSON contains information about extracted files. (default=0)
    • page_size: it tells how many files the result JSON contains information about (default=50). So by default, the result JSON contains information about the first 50 extracted files.
    • worst_data_id: data id of the file that has the worst result in the archive
  • scan_results
    • last_file_scanned (stored only in memory, not in database): If available, the name of the most recent processed file

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

user_agentstring

user_agent header used to identify (and limit) access to a particular rule. For rule selection, rule header should be used.

Path Params
data_idstring

Unique submission identifier. Use this value to reference the submission.

Query String
firstinteger

The first item order in the list child files of archive file

sizeinteger

The number of items to be fetched next, counting from the item order indicated in first header

GET /file/{data_id}
Copy
Responses
200

Entire analysis report generated by MetaDefender Core

405

The user has no rights for this operation.

500

Unexpected event on server

Response
Copy

Fetch Analysis Result By Hash

Retrieve analysis result by hash

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

rulestring

Select rule for the analysis, if no header given the default rule will be selected (URL encoded UTF-8 string of rule name)

selfonlyboolean

Useful to archive hash lookup.

Allow specifying to only perform hash lookup against the original archive file self only, and skip searching all child files result within the original archive.

Default value is false.

timerangeinteger

Scoping down the recent number of hours that hash lookup task should start from till now, instead of searching the entire scan history in MetaDefender Core database.

Default value is 0. That means no time scope.

include-inprogressboolean

False (default): API will return "Not Found" if the verdict is in progress.

True: If the queried hash has a completed processing result before, API will return the completed processing result. If this hash doesn't have any completed processing result, API will return this In-progress result.

Path Params
md5|sha1|sha256|sha512string

Hash value to search. This can be md5, sha1, sha256, sha512

Query String
firstinteger

The first item order in the list child files of archive file

sizeinteger

The number of items to be fetched next, counting from the item order indicated in first header

GET /hash/{md5|sha1|sha256|sha512}
Copy
Responses
200

Get information of file

404

Invalid hash format

Response
Copy

Fetching Available Analysis Rules

Retrieve all available rules with their custom configurations. Fetching available processing rules.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication. Only those rules are returned, that:

  • Match the apikey's role sent using the apikey header, or
  • Are not restricted to a specific role.
user_agentstring

The user agent string value sent in the header (specified by the client).

Only those rules are returned, that:

  • Match the client's user agent sent using the user_agent header, or
  • Are not restricted to a specific user agent.

For details see KB article What are Security Policies and how do I use them?.

GET /file/rules
Copy
Responses
200

Returns the list of available rules.

arrayarray[object]
max_file_sizeinteger

The maximum allowed file size (in bytes) for this rule.

namestring

A unique identifier for identify in the used rule for a scan..

500

Unexpected event on server

Response
Copy

Download Sanitized Files

Retrieve sanitized file based on the data_id

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
data_idstring

The data_id comes from the result of Analyze a file. In case of sanitizing the content of an archive, the data_id of contained file can be found in Fetch analysis result.

GET /file/converted/{data_id}
Copy
Responses
200

Returns the sanitized content.

filefile
404

Requests resource was not found.

405

The user has no rights for this operation.

500

Unexpected event on server

Response
Copy

Download either sanitized files or DLP processed files

Retrieve sanitized file based on the data_id. In case there's no sanitized file, and DLP processed file is available, user will retrieve DLP processed file.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
data_idstring

The data_id comes from the result of Analyze a file. In case of sanitizing the content of an archive, the data_id of contained file can be found in Fetch analysis result.

GET /file/download/{data_id}
Copy
Responses
200

Returns the sanitized or DLP processed content.

filefile
404

File could not be found

405

The user has no rights for this operation.

500

Unexpected event on server

Response
Copy

Cancel File Analysis

When cancelling a file analysis, the connected analysis (e.g. files in an archive) that are still in progress will be cancelled also.

The cancelled analysis will be automatically closed.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
data_idstring

Unique submission identifier. Use this value to reference the submission.

POST /file/{data_id}/cancel
Copy
Responses
200

Analysis was sucessfully cancelled.

objectobject
400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

404

Data ID not found (invalid id) or Requests resource was not found

405

The user has no rights for this operation.

500

Unexpected event on server

Response
Copy

Batch

Group the analysis requests in batches. Supported with endpoints: MetaDefender Distributed Cluster API Gateway.

Initiate Batch

Create a new batch and retrieve the batch_id

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

rulestring

Select rule for the analysis, if no header given the default rule will be selected (URL encoded UTF-8 string of rule name)

user_agentstring

user_agent header used to identify (and limit) access to a particular rule. For rule selection, rule header should be used.

user-datastring

Name of the batch (max 1024 bytes, URL encoded UTF-8 string).

POST /file/batch
Copy
Responses
200

Batch created successfully.

objectobject
batch_idstring

The batch identifier used to submit files in the batch and to close the batch.

400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

500

Unexpected event on server

Response
Copy

Close Batch

The batch will be closed and files can no longer be added to the current batch.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
batchIdstring

The batch identifier used to submit files in the batch and to close the batch.

POST /file/batch/{batchId}/close
Copy
Responses
200

Batch successfully closed.

400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

404

Requests resource was not found.

500

Unexpected event on server

Response
Copy

Status of Batch Analysis

Retrieve status report for the entire batch

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
batchIdstring

The batch identifier used to submit files in the batch and to close the batch.

Query String
firstinteger

The first item order in the list of files in this batch

sizeinteger

The number of items to be fetched next, counting from the item order indicated in first header

GET /file/batch/{batchId}
Copy
Responses
200

Batch progress paginated report (50 entries/page).

400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

404

Requests resource was not found.

500

Unexpected event on server

Response
Copy

Download Signed Batch Result

Download digitally signed status report for the entire batch

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

metadatastring

In JSON format, this can be used to:

Include additional information in the response YML. Currently, one supported field in the metadata is include_vul_info, which can be set to true or false to indicate whether vulnerability processing information should be included. It is strongly recommended to apply URL encoding before sending metadata to Metadefender Core to prevent unexpected issues related to encoding errors or unsafe characters.

Path Params
batchIdstring

The batch identifier used to submit files in the batch and to close the batch.

GET /file/batch/{batchId}/certificate
Copy
Responses
200

Signed batch result and certificate are sent back in response body (YAML format).

No response body
400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

404

Requests resource was not found.

500

Unexpected event on server

Cancel Batch

When cancelling a batch, the connected analysis that are still in progress will be cancelled also.

The cancelled batch will be closed.

Auth
Headers
apikeystring

Generated session_id from Login call can be used as an apikey for API calls that require authentication.

Path Params
batchIdstring

The batch identifier used to submit files in the batch and to close the batch.

POST /file/batch/{batchId}/cancel
Copy
Responses
200

Batch cancelled.

objectobject
400

Bad Request (e.g. invalid header, apikey is missing or invalid).

403

Invalid user information or Not Allowed

404

Batch not found (invalid id)

500

Unexpected event on server

Response
Copy