## Overview
This playbook can be used to scrape a website and return the results to the user, along with the client-side and server-side scraping scripts used to generate the results.
## What's Needed From User
1. Provide a link to the website to be scraped
## Procedure
1. Navigate to the specified webpage
2. Install dependencies
- Install the python package `playwright` and run `playwright install`
- Install csv writer using the command `npm i objects-to-csv`
- Install `playwright` for javascript and run `npx playwright install`
3. Do client-side web scraping in Javascript
- Navigate to the url using `playwright`
- Parse the raw HTML data using `cheerio` to make it easily accesible
- Find the pattern in the DOM to check where the information needed by the user is located
- Isolate the information needed by the user using `querySelector`
- Name the script as `{abbreviated_site_name}_scraper.js`
- Run the script and ave the results in CSV format
- Name the CSV file as `{abbreviated_site_name}_output.csv`
4. Do web scraping in Python
- Use `playwright` to navigate to the given URL
- Name the script as `{abbreviated_site_name}_playwright_scraper.py`
5. Copy and add the same client-side scraping javascript code you have written in step 3 into the Python script
- Execute the Javascript code in the Python script using `page.evaluate()`
- Run the script and save the results in CSV format
- Name the CSV file as `{abbreviated_site_name}_playwright_output.csv`
6. Send the deliverables as attachments to the user
- Send both the Python script and the Javascript script to the user
- Send both the output CSV file of the Python script and the output CSV file of the Javascript script to the user
## Specification
Post-Conditions:
1. Run the Javascript file and generate the CSV file `{abbreviated_site_name}_output.csv`
2. Run the Python script and generate the CSV file `{abbreviated_site_name}_playwright_output.csv`
Delivery Format:
1. 2 separate scripts
- Client side scraping script in Javascript
- Web scraping script in Python
2. 2 separate CSV files
- Output of the client side scraping script as csv
- Output of the python script as csv
## Advice and Pointers
1. Make sure that the data in both the CSV files (generated from Javascript and Python scripts) are the same
2. Remove only the duplicate entries from the CSV files, if present in the file
- There is no need to clean the CSV file or change its formatting
3. Analyze the HTML code carefully to find the relevant elements
4. Run the Javascript file first, followed by the Python script
5. Ensure that the web scraping code in the Python script is exactly the same as in the client-side web scraping code in the Javascript script