There are two tools that data science enthusiasts have definitely heard of, these are namely Google Colaboratory (also known as Colab) and Kaggle. Those who are new to this field may not know the difference between the two, or what these platforms offer. But worry no more. In this post, we will discuss exactly that. Here is a detailed explanation of what Google Colab and Kaggle are, and how you can access various kaggle datasets through google colab to work on various data science challenges.
- Google Colab is a free Jupyter notebook environment that runs in the cloud and stores its notebooks on Google Drive. For data science and machine learning enthusiasts, google colab provides a very convenient place to analyze your datasets and develop your models to test it. All you need to get started is a Google Account to use the platform. This will allow you to do everything you normally do with a regular Jupyter notebook (running code, analyzing data, generating figures) while also benefitting from the power of Google's infrastructure.
- Kaggle is an online platform that allows users to find and publish various datasets. One can also analyze these datasets and build models in the web-based data science environment while working with different data science and machine learning experts. It also allows users to participate in various data science competitions.
In this post, we will mainly cover the steps required to access Kaggle Datasets via Google Colab.
Google Colab Notebook
Before we can begin with google colab, it's important to have a google account. If you don't already have one, we suggest you to setup an account using an official google account.
Once you are done setting up a google account, visit the google colab website using the link: colab.research.google.com/notebooks
In the upper right corner, click on
New Notebook , to open your first google colab notebook. This is the notebook, where we will do all the coding to analyze our data and build models.
However, to go any further, it's necessary to import a dataset into our colaboratory. This dataset can be imported using various methods. However, for this particular post, our main focus is to import datasets using Kaggle.com.
Importing dataset from Kaggle
We will discuss all the steps in detail on how to import a dataset from Kaggle.
If you are new to Kaggle, it's crucial to create an account first. Now you will be able to explore and download files with different extensions like .csv, .ipynb, .docx, and more.
Once your Kaggle account is set up follow the steps below:
Step 1: API Token
- On the top left corner, click on your display picture and select Account.
- Scroll down and select API > Create New API Token.
- A new file will get downloaded to your system named kaggle.json , containing your username and token key.
Step 2: Google Drive
We have to upload this file (kaggle.json) to your google drive. Let's see how.
- Create a new folder where you will save your kaggle datasets. Let's name this folder Kaggle.
- Once the folder is made, upload the kaggle.json file into this folder.
Step 3: Google Colab
- Go to the new colab notebook using File > New Notebook.
- Use the following lines of code to mount your google drive to colab. To execute the cell, you can either click on the Play ( ▶) button on the left of the cell or simply press Ctrl + Enter.
from google.colab import drive drive.mount('/content/gdrive')
- Click on the URL, this will provide you with the authorization code. Copy-paste the figure in the box.
Step 4: Kaggle.json
We will run the following code to provide the configuration path to kaggle.json:
import os os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/MyDrive/Kaggle"
"/content/gdrive/MyDrive/Kaggle" is the path for the Kaggle folder in our google drive.
Step 5: Working Directory
Next, change the present working directory using the following code:
You can also check the present working directory using the command
Step 6: Kaggle Dataset
It's time to download the kaggle dataset. Do the following:
- Go to Kaggle.com.
- Search for any dataset that you wish to import to Colab using the Search bar on top.
- Once you open the dataset, select the Copy API Command from the options (⋮) available below the dataset.
- Copy paste the API succeeding the '!' sign.
!kaggle datasets download -d brijbhushannanda1979/bigmart-sales-data #kaggle datasets download -d brijbhushannanda1979/bigmart-sales-data is the API we copied
- You can also see the content in your directory using the
Step 7: Unzip the files
Now, we need to unzip the files while also removing the zip files. Use the following code:
!unzip \*.zip && rm *.zip
We have successfully imported the kaggle dataset into our google colaboratory. We can use the extracted files now.
Want to learn how to read, manipulate and analyze files using pandas? Check out our pandas series here.