Web Scraping

Devin is your tireless web scraping assistant. It can build web scrapers or even do repetitive web research and information gathering tasks itself!

Use Devin for…

Developing web scrapers for data collection
Performing repetitive web research and data collection
Using existing web scraping services where website terms of service or restrictions require using an existing service

Example Prompts

Try in Devin

## Overview
This playbook can be used to scrape a website and return the results to the user, along with the client-side and server-side scraping scripts used to generate the results.

## What's Needed From User
1. Provide a link to the website to be scraped

## Procedure
1. Navigate to the specified webpage
2. Install dependencies
   - Install the python package `playwright` and run `playwright install`
   - Install csv writer using the command `npm i objects-to-csv`
   - Install `playwright` for javascript and run `npx playwright install`
3. Do client-side web scraping in Javascript
   - Navigate to the url using `playwright`
   - Parse the raw HTML data using `cheerio` to make it easily accesible
   - Find the pattern in the DOM to check where the information needed by the user is located
   - Isolate the information needed by the user using `querySelector`
   - Name the script as `{abbreviated_site_name}_scraper.js`
   - Run the script and ave the results in CSV format
   - Name the CSV file as `{abbreviated_site_name}_output.csv`
4. Do web scraping in Python
   - Use `playwright` to navigate to the given URL
   - Name the script as `{abbreviated_site_name}_playwright_scraper.py`
5. Copy and add the same client-side scraping javascript code you have written in step 3 into the Python script
   - Execute the Javascript code in the Python script using `page.evaluate()`
   - Run the script and save the results in CSV format
   - Name the CSV file as `{abbreviated_site_name}_playwright_output.csv`
6. Send the deliverables as attachments to the user
   - Send both the Python script and the Javascript script to the user
   - Send both the output CSV file of the Python script and the output CSV file of the Javascript script to the user

## Specification
Post-Conditions:
1. Run the Javascript file and generate the CSV file `{abbreviated_site_name}_output.csv`
2. Run the Python script and generate the CSV file `{abbreviated_site_name}_playwright_output.csv`

Delivery Format:
1. 2 separate scripts
   - Client side scraping script in Javascript
   - Web scraping script in Python
2. 2 separate CSV files
   - Output of the client side scraping script as csv
   - Output of the python script as csv

## Advice and Pointers
1. Make sure that the data in both the CSV files (generated from Javascript and Python scripts) are the same
2. Remove only the duplicate entries from the CSV files, if present in the file
   - There is no need to clean the CSV file or change its formatting
3. Analyze the HTML code carefully to find the relevant elements
4. Run the Javascript file first, followed by the Python script
5. Ensure that the web scraping code in the Python script is exactly the same as in the client-side web scraping code in the Javascript script

Try in Devin

Use this (https://github.com/muan/unicode-emoji-json) to write a function that converts a string like https://www.gstatic.com/android/keyboard/emojikitchen/20201001/u1f600/u1f600_u2615.png to "grinning_face_warm_beverage" by extracting the 2 emojis (u1f600, u2615) and converting them to actual emojis (use python for this) and then looking them up in the dictionary provided by the unicode emoji json library I linked. 
Throw an error if the emoji isn't found, including the codepoint in the error message. 
Then run the function on https://gist.githubusercontent.com/ryanseddon/0925ba915d4f865228ee3e6e0ddbe52c/raw/aa5cc2dbab3a9f3eaa1dc5d22dc1dc88d184dc4f/urls.txt and output a csv of the results.
Remember to use a csv writer library like pandas.

Example Sessions

Scrape emojis https://preview.devin.ai/sessions/4f8a7b129820493b9c0ca140cddede50

Scrape a YouTube playlist https://preview.devin.ai/sessions/8c6edbbb0bce4b70acd09255e1994c0b

Scrape Ebay page https://preview.devin.ai/sessions/dc70fe0649cb4041852da384e65d42be

Get Started

Learn about Devin

Onboard Devin

Collaborate with Devin

Working with Teams

Tutorials

Integrations

API Reference

Other

Troubleshooting

Use Devin for…

Example Prompts

Example Sessions

Get Started

Learn about Devin

Onboard Devin

Collaborate with Devin

Working with Teams

Tutorials

Integrations

API Reference

Other

Troubleshooting

​Use Devin for…

​Example Prompts

​Example Sessions

Use Devin for…

Example Prompts

Example Sessions