In this 'Web Crawling with Python' repo, we have covered the following scenario:
Unique links from LambdaTest E-commerce Playground are crawled using Beautiful Soup. Content (i.e., product meta-data) from the crawled content is than scraped with Beautiful Soup. I have a detailed blog & repo on Web Scraping with Python, details below:
Step 1
Create a virtual environment by triggering the virtualenv venv command on the terminal
virtualenv venv
Step 2
Navigate the newly created virtual environment by triggering the source venv/bin/activate command on the terminal
source venv/bin/activateFollow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:
Step 3
Run the make install command on the terminal to install the desired packages (or dependencies) - Beautiful Soup,urrlib3, etc.
make install
With this, all the dependencies and environment variables are set. We are all set for web crawling with Beautiful Soup (bs4).
Follow the below mentioned steps to for crawling the LambdaTest E-commerce Playground
Step 1
Trigger the command make clean to clean the remove pycache folder(s) and .pyc files
Step 2
Trigger the make crawl-ecommerce-playground command on the terminal to crawl the LambdaTest E-Commerce Playground
As seen above, the content from LambdaTest E-commerce playground was crawled successfully! Fifty five unique product links are now available to be scraped in the exported JSON file (i.e., ecommerce_crawled_urls.json)
Step 3
Now that we have the crawled information, trigger the make scrap-ecommerce-playground command on the terminal to scrap the product information (i.e., product name, product price, product availability, etc.) from the exported JSON file.
Also, all the 55 links on are scraped without any issues!
Feel free to fork the repo and contribute to make it better! Email to himanshu[dot]sheth[at]gmail[dot]com for any queries or ping me on the following social media sites:
LinkedIn: @hjsblogger
Twitter: @hjsblogger