Tag: Databricks

Streamlining Your Journey: Automating SCIM Configuration for Azure Databricks with Terraform

Recently, I embarked on a particularly challenging automation task: automating the SCIM (System for Cross-domain Identity Management) provisioning within the Azure Databricks environment. This journey, spurred by the necessity to efficiently manage user access and identities, highlighted the importa...

The Best Approach to Safeguarding Your Databricks Environment: A Comprehensive Guide to Backup and Restore Databricks

Backing up major Platform-as-a-Service (PAAS) systems can be a daunting task, but the importance of safeguarding these platforms cannot be overstated. Disaster can strike at any moment, regardless of how diligently you’ve followed best practices. In this blog, I will guide you through the s...

Configuring DNS resolution for Private Databricks Workspaces (AWS)

For customers on the E2 Platform, Databricks has a feature that allows them to use AWS PrivateLink to provision secure private workspaces by creating VPC endpoints to both the front-end and back-end interfaces of the Databricks infrastructure. The front-end VPC endpoint ensures that users connect to...

How read a target table column data types and cast the same columns of the source table in azure databricks using pyspark

How to copy a delta table with dynamically casting all the columns to the data type of the target delta table columns in azure databricks using pyspark In this blog post, I will show you how to copy a delta table with dynamically casting all the columns to the data type of the target delta table ...

Designing a Multi-Cloud Data Platform with Databricks

Multi-cloud deployments have become increasingly popular in recent years due to the benefits it provides such as increased resiliency and availability of applications and services. By utilising multiple cloud providers, organisations can avoid service disruptions caused by a single provider’s ...

Cleaning up Cluster Logs in Databricks

In any data engineering or analytics environment, managing logs is a crucial task. Logs provide valuable insights into the health and performance of your clusters, but they can also consume valuable storage space if not managed properly. In this blog post, we will guide you through the process of cl...

How to unit test PySpark programs in Databricks notebook?

Unit testing is a software development process in which the smallest testable parts of an application, called units, are individually and independently scrutinized for proper operation. Photo by Startup Stock Photos from Pexels Nutter framework from Microsoft makes it e...

Getting started with Databricks in Azure

In the modern world of data-driven decision-making, developers and data scientists play a crucial role in harnessing the potential of data. Databricks is a unified analytics platform designed to help developers, data scientists, and analysts collaborate seamlessly on big data projects. Leveraging th...

Win The Title ???Databricks Solutions Champion Architect.???

Recently, I was acknowledged as a “Databricks Solutions Architect Champion” for my recurring contributions to customer success and meaningful value creation through data engineering solutions leveraging Databricks. Winning this title is often seen as arduous, but with the proper insig...

Finding the Path to a Managed Table in Databricks

This article shows how to find a path for a managed Databricks table. In Databricks, you might have been creating managed tables, writing to managed tables and reading from managed tables using the database.tablename (catalog.database.tablename, if you have upgraded to Unity Catalog) pattern. And...