Databricks
Databricks is a Warehouse output connector. Connect to your Databricks data warehouse to get all your vulnerability data in one place.
IP Allowlists
You might need to add the Monad IP range of 34.210.32.104/32
to your Databricks allowlist, so Monad can connect to it.
When you connect Databricks with Monad, you own your own data. You can get all of your security and environment data in one place, and run BI or SQL analytics tools, or SQL queries on it.
You must create a workspace and a service user before you connect Databricks to Monad. This service user can either be a regular user which has access to the Databricks UI or a service principal which only has programmatic access. Although it contains a few extra steps, note that it is highly recommended to use service principals since they come with a higher level of security. To create a service user, you must be an account admin or a workspace admin in Databricks.
Create a dedicated Databricks workspace for Monad
Although it is possible to connect Monad to Databricks using an existing workspace, it is highly discouraged. Since Monad loads data into tables in your warehouse using a remote path, the service user needs ANY FILE
privilege as documented in Databricks documentation. By creating a dedicated workspace, you can make sure Monad operates in a completely isolated environment with access only to the data it creates. Note that no matter which workspace you use, Monad only queries tables created by its own service user and never interacts with any other table in the workspace.
To create a new workspace, follow the Databricks documentation.
Create a service user
Follow one of the options outlined below.
Option 1: Use a service principal (recommended)
- Follow the Databricks documentation to create a service principal. Once you create the service principal, take a note of the
Application ID
as you will need it in Step 3. - Follow the Databricks documentation to add the service principal to your workspace.
- Follow the Databricks documentation and give the service principal the privilege to use access tokens.
- Follow the Databricks documentation to create an access token for the service principal. Monad recommends using Postman for these steps but using Curl also works.
You need a personal access token of a workspace admin or an account admin in order to create an access token for the service principal. If you don’t have a personal access token already, follow the documentation to obtain one.
Once you complete the steps, take a note of the token_value
in the response.
It is very important to safeguard your access token. Do not check it into version control.
Option 2: Use a regular user
-
Follow the Databricks documentation to create a regular user.
-
Follow the Databricks documentation to add the user to your workspace.
-
Follow the Databricks documentation to create an access token and take a note of it.
Once you create the service user, you are ready to setup the Databricks environment.
Setup the Databricks environment
You will set up a new database and grant the necessary roles to the service user so that Monad can connect to your workspace.
Create a database
- Log in to your Databricks SQL as an admin by following the Databricks documentation.
- Run the following command to create a database called
monad
. Note that Monad only uses thehive_metastore
as its catalog. Custom catalog names will be offered in the near future.
CREATE DATABASE hive_metastore.monad;
USE hive_metastore.monad;
Grant permissions
Run the following commands to grant the necessary privileges to the service user. You need the account identifier of the service user to proceed.
- If you are using a service principal,
account_identifier
is theApplication ID
next to the service principal in your account console - If you are using a regular user, then use the email address you assigned to the user as the
account_identifier
. You can find this information in your account console.
Surrounding the account identifier with backticks `` is mandatory
GRANT ALL PRIVILEGES ON DATABASE hive_metastore.monad TO `<account_identifier>`;
GRANT SELECT ON ANY file TO `<account_identifier>`;
GRANT MODIFY ON ANY file TO `<account_identifier>`;
For example:
GRANT ALL PRIVILEGES ON DATABASE hive_metastore.monad TO `service_user@yourcompany.com`;
GRANT SELECT ON ANY file TO `service_user@yourcompany.com`;
GRANT MODIFY ON ANY file TO `service_user@yourcompany.com`;
Create a Databricks output connector
In order to set up a Databricks connector, you need the following information.
- Cluster domain: This is the URL to your workspace and the format is
<workspace_id>.cloud.databricks.com
. For example,abc-123456abc-xyzt.cloud.databricks.com
. - SQL Warehouse HTTP Path: Sign in to your SQL editor in Databricks and click on
SQL Warehouses
from the panel on the left. Note the warehouse id. The format for this field is/sql/1.0/warehouses/<warehouse_id>
. For example:/sql/1.0/warehouses/123456abcx789yz
- API Token: This is the access token you obtained for your service user in the Create a service user section
Once you have this information, you set up the connector:
- Log in to your Monad account, and click Add connector.
- Select the Databricks connector.
- Optionally, change the default name for the connector. This name serves as a label for the connector in the Monad app, and you can change it later.
- Enter the information for your Databricks environment.
- Put in
monad
for the Database field. - (Optional) - select your models to export
- Click Connect.
Sample Completed setup form:
Monad then tests the connection to Databricks, and if successful, begins syncing data from your Monad account into your Databricks warehouse.
This page was last modified: 9 Oct 2023