Title
Create new category
Edit page index title
Edit category
Edit link
Lake and Warehouse setup
Overview
This section describes how to set up the Data Lake and Data Warehouse in PostgreSQL. Once configured, execution services (such as MD Core, MD Cluster API Gateway and MD Cluster Callback Service) can store and retrieve scan results from them, and MD Cluster Control Center can access them to generate executive reports.
Prerequisites
- PostgreSQL Service: Must be installed and running.
- MD Cluster Control Center: Must be installed.
- Superuser Rights: PostgreSQL user must have superuser privileges.
Assumption
It is assumed that PostgreSQL Server A is designated to host the Data Lake, while PostgreSQL Server B hosts the Data Warehouse. The Data Lake must be configured first, as the Data Warehouse will connect to it as its primary data source.
Windows
- On the machine hosting MD Cluster Control Center, navigate to the folder:
xxxxxxxxxxC:\Program Files\OPSWAT\MetaDefender Cluster Control Center- Run the following command to set up Data Lake on PostgreSQL Server A:
xxxxxxxxxxmd-cluster-dbready.exe --host=<postgres-host> --port=<postgres-port> --user=<postgres-user> --password=<postgres-password> --target=lake- Run the following command to set up Data Warehouse on PostgreSQL Server B:
xxxxxxxxxxmd-cluster-dbready.exe --host=<postgres-host> --port=<postgres-port> --user=<postgres-user> --password=<postgres-password> --lake-host=<lake-postgres-host> --lake-port=<lake-postgres-port> --lake-user=<lake-postgres-user> --lake-password=<lake-postgres-password> --target=warehouseLinux
- On the machine hosting MD Cluster Control Center, navigate to the folder:
xxxxxxxxxx/usr/sbin- Run the following command to set up Data Lake on PostgreSQL Server A:
xxxxxxxxxxmd-cluster-dbready --host=<postgres-host> --port=<postgres-port> --user=<postgres-user> --password=<postgres-password> --target=lake- Run the following command to set up Data Warehouse on PostgreSQL Server B:
xxxxxxxxxxmd-cluster-dbready --host=<postgres-host> --port=<postgres-port> --user=<postgres-user> --password=<postgres-password> --lake-host=<lake-postgres-host> --lake-port=<lake-postgres-port> --lake-user=<lake-postgres-user> --lake-password=<lake-postgres-password> --target=warehouseCombined Lake and Warehouse
If the Data Lake and Data Warehouse are hosted on the same PostgreSQL instance, a combined setup command can be used:
xxxxxxxxxxmd-cluster-dbready --host=<postgres-host> --port=<postgres-port> --user=<postgres-user> --password=<postgres-password> --target=lake,warehouseWhile this approach is simpler and faster to deploy, it is not recommended for large-scale or long-running systems due to potential performance and scalability limitations.