avatarGanesh Chandrasekaran

Summary

The article provides a method for downloading files from Databricks Filestore to a local machine using the displayHTML function in a Databricks notebook.

Abstract

Databricks Filestore is a file system that allows users to upload files to dbfs://FileStore, but it lacks a direct method for downloading files. The article addresses this limitation by presenting a workaround that involves using the displayHTML function within a Databricks notebook. This function renders an HTML link that, when clicked, initiates the download of the desired file to the user's local machine. The process requires the file to be stored under /FileStore/ and uses a specific URL format that includes the /files prefix followed by the file's path. The article concludes by acknowledging Chris Grant for the tip and provides additional resources for further engagement with the author's content and expertise.

Opinions

  • The author suggests that the conventional method of downloading files from Databricks Filestore is not straightforward, implying a need for a more user-friendly solution.
  • The use of displayHTML to create a downloadable link is presented as a "cool trick," indicating the author's appreciation for the ingenuity of the solution.
  • The article implies that the described method is effective for files too large to download directly, such as those exceeding 1 million rows.
  • By thanking Chris Grant, the author acknowledges the collaborative nature of problem-solving within the tech community and the value of shared knowledge.
  • The inclusion of referral links and invitations to subscribe suggests the author's interest in building a community and providing additional value to readers through membership and expert sessions.

How to download a file from Databricks filestore to a local machine?

Databricks provides an interface to upload a file from the local machine to the dbfs://FileStore file system. But for downloading the file from dbfs://Filestore, there is no direct method. But that can be achieved by a tweak.

Photo by Miguel Á. Padriñán: https://www.pexels.com/photo/close-up-shot-of-keyboard-buttons-2882550/

Let's take a quick look at FileStore

%fs
ls /FileStore/tables/

Some of the CSV files have more than 1Million rows, so its not possible to download them directly.

But here is a cool trick to download any file from Databricks filestore using displayHTML. Basically, it renders the HTML as output.

It is assumed the file is stored inside /FileStore/.

In our case, the file is under /FileStore/tables/Electricity_GRE.csv

/files/tables/Electricity_GRE.csv

the prefix /files is mandatory followed by the folder of the file you are trying to download.

%python
displayHTML("""<a href="/files/tables/Electricity_GRE.csv" download>Download CSV </a>""")

Clicking the Download CSV link (yes like web link), it downloads the file to your local machine.

Thank you Chris Grant for this handy tip.

Schedule a DDIChat Session in Coding, Software, and Mobile Development:

Apply to be a DDIChat Expert here. Work with DDI: https://datadriveninvestor.com/collaborate Subscribe to DDIntel here.

Databricks
Download
Filestore
Displayhtml
Recommended from ReadMedium