Lake and Warehouse setup

Overview

This section describes how to set up the Data Lake and Data Warehouse in PostgreSQL. Once configured, execution services (such as MD Core, MD Cluster API Gateway and MD Cluster Callback Service) can store and retrieve scan results from them, and MD Cluster Control Center can access them to generate executive reports.

Prerequisites

  1. PostgreSQL Service: Must be installed and running.
  2. MD Cluster Control Center: Must be installed.
  3. Superuser Rights: PostgreSQL user must have superuser privileges.

Assumption

It is assumed that PostgreSQL Server A is designated to host the Data Lake, while PostgreSQL Server B hosts the Data Warehouse. The Data Lake must be configured first, as the Data Warehouse will connect to it as its primary data source.

Windows

  1. On the machine hosting MD Cluster Control Center, navigate to the folder:
Powershell
Copy
  1. Run the following command to set up Data Lake on PostgreSQL Server A:
Powershell
Copy
  1. Run the following command to set up Data Warehouse on PostgreSQL Server B:
Powershell
Copy

Linux

  1. On the machine hosting MD Cluster Control Center, navigate to the folder:
Bash
Copy
  1. Run the following command to set up Data Lake on PostgreSQL Server A:
Bash
Copy
  1. Run the following command to set up Data Warehouse on PostgreSQL Server B:
Bash
Copy

Combined Lake and Warehouse

If the Data Lake and Data Warehouse are hosted on the same PostgreSQL instance, a combined setup command can be used:

Bash
Copy

While this approach is simpler and faster to deploy, it is not recommended for large-scale or long-running systems due to potential performance and scalability limitations.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard