Web Crawling with Python

Image generated using Grok

In this 'Web Crawling with Python' repo, we have covered the following scenario:

Unique links from LambdaTest E-commerce Playground are crawled using Beautiful Soup. Content (i.e., product meta-data) from the crawled content is than scraped with Beautiful Soup. I have a detailed blog & repo on Web Scraping with Python, details below:

Pre-requisites for test execution

Step 1

Create a virtual environment by triggering the virtualenv venv command on the terminal

virtualenv venv

Step 2

Navigate the newly created virtual environment by triggering the source venv/bin/activate command on the terminal

source venv/bin/activate

Follow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:

Step 3

Run the make install command on the terminal to install the desired packages (or dependencies) - Beautiful Soup,urrlib3, etc.

make install

With this, all the dependencies and environment variables are set. We are all set for web crawling with Beautiful Soup (bs4).

Web Crawling using Beautiful Soup

Follow the below mentioned steps to for crawling the LambdaTest E-commerce Playground

Step 1

Trigger the command make clean to clean the remove pycache folder(s) and .pyc files

Step 2

Trigger the make crawl-ecommerce-playground command on the terminal to crawl the LambdaTest E-Commerce Playground

As seen above, the content from LambdaTest E-commerce playground was crawled successfully! Fifty five unique product links are now available to be scraped in the exported JSON file (i.e., ecommerce_crawled_urls.json)

Step 3

Now that we have the crawled information, trigger the make scrap-ecommerce-playground command on the terminal to scrap the product information (i.e., product name, product price, product availability, etc.) from the exported JSON file.

Also, all the 55 links on are scraped without any issues!

Have feedback or need assistance?

Feel free to fork the repo and contribute to make it better! Email to himanshu[dot]sheth[at]gmail[dot]com for any queries or ping me on the following social media sites:

LinkedIn: @hjsblogger
Twitter: @hjsblogger

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
crawler		crawler
main		main
parser		parser
scraper		scraper
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawling with Python

Pre-requisites for test execution

Web Crawling using Beautiful Soup

Have feedback or need assistance?

About

Uh oh!

Releases

Packages

Languages

hjsblogger/web-crawling-with-python

Folders and files

Latest commit

History

Repository files navigation

Web Crawling with Python

Pre-requisites for test execution

Web Crawling using Beautiful Soup

Have feedback or need assistance?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages