Web scraping with Python and Beautiful Soup

Web scraping with Python and Beautiful Soup

Today I am gonna web scraping So the task is to:

  1. Scrape the top 100 songs from the timeframe of our choice from Billboard
  2. Using the Spotipy create the playlist with 100 songs that were scraped from the Billboard

So here we go: We will need:requests module, BeautifulSoup library and spotipy module, so:

import requests
import spotipy
from bs4 import BeautifulSoup
from spotipy.oauth2 import SpotifyOAuth

First we need to create the application in the Spotify Developer panel: Spotify2.png Spotify1.png Then we save user credentials into the constant variables (you can find them in your spotify application that you just created:

YOUR_APP_CLIENT_ID = "here_goes_your_clientID_for_your_spotify_application_you_just_created"
YOUR_APP_CLIENT_SECRET = "here_goes_your_clientSecret_for_your_spotify_application_you_just_created"

After that we will connect to billboard server to scrape out all the 100 songs for the date of our choice. And the date will get from the users input:

date = input("Which year would you like to travel to? Write the date in this format YYYY-MM-DD")
response = requests.get(f"https://www.billboard.com/charts/hot-100/{date}/")

and we will need the contents of the html page so:

music_page = response.text

After that we create soup object to scrape out the titles of 100 songs from billboard website:

soup = BeautifulSoup(music_page, "html.parser")

and from what I saw on the page, the way the list of songs is structured it is like a bunge of lists inside of a list, so I needed the list slice() because without that I was getting not only title but also the songwriters and producers of the song, so slice() definetely helped

songs_list = [song.getText().strip() for song in song.find_all(name="h3", id="title-of-a-story")][2::4]

After that I connect to Spotify with a help of [spotipy](https://spotipy.readthedocs.io/en/2.13.0/#) Remember, one of the parameters is cache_path where we put the name of a temporary file with our auth token (for connection to the server we use "playlist-modify-private" scope parameter when we first run the code it will open the redirect url which we created earlier where we will need to copy the link of the opened page and paste it into the prompt line in our pycharm console. Then we will need to close our programm and reopen it. After that we will see that there will appear the temporary file with our token inside.

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=YOUR_APP_CLIENT_ID,
                                               client_secret=YOUR_APP_CLIENT_SECRET,
                                               redirect_uri="http://example.com",
                                               scope="playlist-modify-private",
                                               cache_path=".cache"))

and this is what data your token file will have inside:

{"access_token": "here_will_be_your_token", "token_type": "Bearer", "expires_in": 3600, "refresh_token": "here_will_be_the_refresh_token", "scope": "playlist-modify-private", "expires_at": 1659777304}

Then we get out and save user id and username into variables. Also, for later we will need the year which we can extract from our date variable(aka input).Lastly, we also will need the empty list to record created [uris](https://spotipy.readthedocs.io/en/2.13.0/#ids-uris-and-urls):

curr_user_id = sp.current_user()["id"]
curr_username = sp.current_user()["display_name"]
year = date.split("-")[0]
songs_uris = []

finnaly, we can search each song by each song title we got from scraping the Billboard and year

for song in songs_list:
    result = sp.search(q=f"track: {song}, year: {year}", type="track")

to eliminate all the errors in case the particular song was not found we will use try except:

    try:
        uri = result["tracks"]["items"][0]["uri"]
        songs_uris.append(uri)
    except IndexError:
        print(f"{song} doesn't exist in Spotify. Skipped.")

finnaly, we can create a new private playlist:

playlist = sp.user_playlist_create(user=curr_user_id, name=f"{date} Billboard 100", public=False)

and now when we have an empty playlist we can finally save all the found songs into the playlist:

sp.playlist_add_items(playlist_id=playlist["id"], items=songs_uris)