import kagglehub
import shutil
import os
def download_kaggle_dataset(dataset_slug, destination_dir):
# Download dataset to the default location
path = kagglehub.dataset_download(dataset_slug)
# Ensure the destination directory exists
os.makedirs(destination_dir, exist_ok=True)
# Move files to the preferred directory
if os.path.exists(path):
shutil.move(path, destination_dir)
print(f"Dataset '{dataset_slug}' moved to:", destination_dir)
else:
print("Download path does not exist.")
return destination_dir
# Example usage
dataset_path = download_kaggle_dataset("timchant/supstore-dataset-2019-2022", "/Users/teslim/OneDrive/mlzoomcamp")
print("Dataset is available at:", dataset_path)
Each time you want to use this standard function to download a new Kaggle dataset to a specific location, you’ll only need to modify two parameters in the function call:
dataset_slug
: This is the unique identifier for the Kaggle dataset you want to download. It follows the format:destination_dir
: This is the directory path on your local machine where you want the dataset saved. Change this path to organize datasets by project or location.
arduino
Copy code
"owner/dataset-name"
For example, "timchant/supstore-dataset-2019-2022"
in your original code.
Here’s a quick step-by-step guide for using the function with these changes:
Step-by-Step Guide
- Identify the Dataset Slug:
- Go to the Kaggle dataset you want to download. The slug is part of the URL:
- For example, for the dataset located at
https://www.kaggle.com/timchant/supstore-dataset-2019-2022
, the slug is: - Choose Your Destination Directory:
- Decide where you want the dataset to be saved. You might organize it within a project folder or a general data folder.
- Specify the full path or a relative path based on your project setup:
- Replace
"/path/to/your/project/data"
with the actual path you want. - Run the Function with Updated Parameters:
- Call the function with the new dataset slug and destination directory.
arduino
Copy code
https://www.kaggle.com/dataset-owner/dataset-name
python
Copy code
dataset_slug = "timchant/supstore-dataset-2019-2022"
python
Copy code
destination_dir = "/path/to/your/project/data"
python
Copy code
dataset_path = download_kaggle_dataset("timchant/supstore-dataset-2019-2022", "/Users/teslim/OneDrive/mlzoomcamp")
print("Dataset is available at:", dataset_path)
Example Usage
Suppose you want to download a different dataset, such as "zynicide/wine-reviews"
, and save it to a folder named "wine_analysis"
inside your documents directory. Here’s how you’d adjust the function call:
python
Copy code
# Define new dataset slug and destination directory
dataset_slug = "zynicide/wine-reviews"
destination_dir = "/Users/teslim/Documents/wine_analysis"
# Download the dataset
dataset_path = download_kaggle_dataset(dataset_slug, destination_dir)
print("Dataset is available at:", dataset_path)
Summary of What You’ll Change Each Time
dataset_slug
– Update to the slug of the dataset you want to download.destination_dir
– Update to the directory where you want the dataset saved.
With these two simple changes, you can easily use this function to download any Kaggle dataset to your preferred directory.