How to Scrape Emails with Scrapebox Email Scraper Premium Plugin

March 06, 2019 7 min read

A GUIDE ON HOW TO SCRAPE AND VERIFY NICHE-TARGET EMAIL LISTS USING SCRAPEBOX

How to Scrape Emails with Scrapebox

In this tutorial, I am going to show you how to scrape your B2B email list of target/prospective clients.

Scrapebox is an excellent tool that has a lot of powerful features that can be combined together to take email scraping to another level.

Step 1: Download, Install and Configure HMA! Pro VPN Or Add your Proxies

There are a number of ways you can go about proxies. You can either install and configure HMA! Pro VPN and run it with a timed out IP change. You will need to download the "previous version" from https://www.hidemyass.com/en-gb/installation-files because it has a timed out IP change which is something that the new version does not have. For scraping, we need to change our IP address as frequently as possible because the search engines will start to ban an IP address after a few hundred or a couple of thousands scrapes.

HMA Pro VPN - 1

A better option is to use your own private proxies because they are more stable. I like to use backconnect rotating proxies from Storm Proxies but those can be very expensive. You will most likely be running Scrapebox at a hundred or more threads and this will require a very large custom proxy package with many threads. This can be fairly expensive considering that you need the proxies just for scraping.

Step 2: Compile your List of Root Keywords for Your Prospects

The next step will be for you to put together a list of root keywords for the prospects you are targeting. Root keywords will most often comprise category keywords, brands, popular keywords and so on. The idea is to then use the keywords with a keyword research tool to scrape more keywords for each root keywords.

So, for example, let's say that we are targeting all the CBD websites. I have gone onto Google and searched for a few CBD sites and copied their categories, brands and even popular search term words. For the sake of this example, I will keep the list short as I do not want for it to occupy the best part of this article. As a general rule of thumb, you should have at least 200 - 500 root keywords. You need to ensure that there are enough for when it comes to using them as a basis to scrape more niche-related keywords.

CBD Oil
Cannabinoid
CBD Gummies
CBD Oil buy
CBD Cream
CBD Pain
CBD Local
CBD dogs
CBD Wholesale
Pure CBD
Hemp oil
CBD Coupon
CBD for sale
CBD Isolate
CBD Capsules
CBD back pain
CBD anxiety
CBD joint muscle
CBD fibromyalgia
CBD bulk
CBD green roads
CBD Full spectrum
CBD broad spectrum
CBD Terpenes
CBD reseller
Green Roads
CBD edibles
CBD coffee
CBD medical
CBD plant
CBD treatment
CBD for disabilities
CBD with thc
CBD near me
CBD for honey
CBD for skin
Best CBD
CBD powder
CBD food
CBD drops
CBD vape
CBD water
CBD candy
CBD Chocolate
CBD mints
CBD Face cream
CBD skin care
CBD lotion
CBD patches
CBD Skin patch
CBD Eliquids
CBD Reviews
Hemp Oil
CBD Products
CBD Dog Treats
CBD Isolate
CBD Edibles
CBD Pills Capsules and Tinctures
CBD for Pets
Marijuana Derived CBD
Hemp Derived CBD
4 Corners Cannabis
Aceso
Alaska Cannabis Exchange
Alternate Vape CBD
Ambary Gardens
American Shaman
Amma Life
Amrita
AON Mother Nature
Arisitol
Bio Hemp CBD
BioCBD Plus
Blue Moon Hemp
Blue Sky Biologicals
Bluebird Botanicals
Brand Rating
Browns Botanicals CBD
Buddha CBD Teas
Buddha Teas
CanChew Gum
Canna Companion
Cannabidiol Life
Cannadiol
Canna Pet
CannazAll
Cannimal
Casa Luna Chocolate
CBD BioCare
CBD Drip
CBD for Life
CBD for Life 1
CBD Fusion Water
CBD Global Extracts
CBD Infusions
CBD Infusions 1
CBD Living Water
CBD RxFunctional Remedies
CBD Unlimited
CBDfx
CBDistillery
CBDRx
Cibaderm
Cibdex
Cibdol
Cloud 9 Hemp
Crystal Pure CBD
CTFO
CW Botanicals
CW Botanicals
Delta botanicals
Diamond CBD
Dixie Botanicals
Dixie Botanicalss
Dose of Nature
Dreem Nutrition
Elite Botanicals
Elixinol CBD Australia
Elixinol
Endoca CBD
Endoca
Enecta
Entourage Hemp
Fab CBD
Fairwinds CBD
Farmacy Bliss
Folium Biosciences
Foria
Gevitta
Golden Leaf CBD
Green Garden Gold
Green Gorilla
Green Roads
Happy Buddha
Harmony
HealthSmart CBD
Hemp Bombs
Hemp Forte
Hemp Fusion
Hemp Health Technologies
HempFusionReview
Hempgenix
Hempland USA
HempleBox
Hemplogica
Hemplucid
HempMeds
Hempotion
Hempower
hempworx
Hi CBD
Highland Pharms
HMP2GO
Holland Hemp Company
Holy Grail CBD
Hygia Nutrients
iHemp CBD
Illuminent CBD
Imbue Botanicals
iPuff CBD
iPuff
Irie CBD
Isodiol
Jeffs Best Hemp
Joy Organics
Juju Royal
KanaVape
Kannaway
Kats Naturals
Kiva Confections
Koi CBD
Kure CBD
Lidtke
List of CBD Oil Companies Weveed
LolaHemp
Love CBD
Mana Artisan Botanics
Marys Medicinals
Marys Medicinals
MEDIHEMP
MedJoy
MedTerra
MedUSA CBD
Michigan Hemp Company
Milagro CBD
Miracle Smokes
Muscle MX
Myaderm
NanoCraft CBD
Natures Way Botanicals
Natures Script
Nectar Leaf
Noontide Herbal Elixirs
NuLeaf Naturals
Nulief
NutraCanna
Ojai Energetics
Organabus
Palmetto Harmony
PH Secrets
PharmaHemp
phivida
Plus CBD Oil
Populum
Prime My Body
Prime Sunshine
Procana
Progressive Pet Products Inc
Pura Vida
PURE CBD VAPORS
Pure Ratios
Pure Science Lab
PureKana
Purity Petibles
PyoorCBD
RE Botanicals
Real Scientific Hemp Oil
Receptra Naturals
Receptra
Restorative Botanicals
Sadica
Sagely Naturals
SAUC
Shanti Wellness
Smart Organics
SmartHealth CBD
Sol CBD
SoS Pain
Sunday Scaries
Tasty Hemp Oil
The Fay Farm
The Medics Inc
The Wee Hemp Company
Therabis
Therapy Pure Essentials
Thoughtcloud
Top Verified CBD Brands
Treatibles
TreatWell
Trompetol
Try the CBD
Vape Bright
Veritas Farms
Vina Bell
ViPova
VitaMia Hemp
Wildflower
Zakah Life Essentials
Zamnesia

Awesome! Now, I am going to use Domination Robot (https://dominationrobot.com) keyword research tool to expand this list of root keywords to prepare myself and extensive list of niche-related keywords to use with Scrapebox. As you can see below, just enter these root or "seed" keywords into the "Seed keywords" pane and hit start. The keyword scraper will start to collect related keywords for each one of the Seed keywords.

domination robot keyword research tool

 

Step 3: Enter the Entire Keyword List into the Scrapebox Keyword List

Step 3: Enter the Entire Keyword List into the Scrapebox Keyword List

Select "custom footprint". We are not going to be scraping for particular website platforms so it is ok to use no footprint at all.

Step 4: Go to Settings - Connections, Timeouts and Other Settings and Configure the Threads

Step 4: Go to Settings - Connections, Timeouts and Other Settings and Configure the Threads

I recommend that you use the default values of 50 threads for each function as Scrapebox does have a tendency to crash when being run at a large thread number.

Step 5: Auto Remove Duplicate URLS and Domains from Results

Step 5: Auto Remove Duplicate URLS and Domains from Results

I recommend that you remove all the duplicate URLs and domains is because you do not want to scrape emails from those urls as it will result in a lot of duplicates.

Step 6: Start Harvesting URLs from the Search Engines

Step 6: Start Harvesting URLs from the Search Engines

Select however many search engines you need and start harvesting. If you are using proxies, make sure you have proxies enabled and if you are running HMA! VPN PRO in the background, make sure that the IP changes every minute as for scraping, you need as many new proxies/IPs as possible!

Step 6: After the Harvesting is Complete, select Exit to Main

Step 6: After the Harvesting is Complete, select Exit to Main

Step 7: Export URLs as Text

Step 7: Export URLs as Text

Now you need to export all the scraped website urls in a text file. You should create a folder with the name of the niche and name your file using the niche name and the fact that these urls are targets/prospective clients.

Step 8: Export URLs as Text - Name your File

Step 8: Export URLs as Text - Name your File

You should now name your notepad text file accordingly. This is how I named it:

"CBD - Scrapebox target client urls"

Step 9: Email Scraper Premium Plugin - Open

Step 9: Email Scraper Premium Plugin - Open

Now you need to open the Scrapebox Email Scraper Premium Plugin.

Step 10: Email Scraper Premium Plugin - Crawl Load List

Step 10: Email Scraper Premium Plugin - Crawl Load List

Now we are going to load the website urls that we have just scraped and extract the email addresses from those urls.

Step 11: Scrapebox Email Scraper Premium Plugin - Crawl Load List - Load Urls

Step 11: Scrapebox Email Scraper Premium Plugin - Crawl Load List - Load Urls

Now we are going to select the file with all the scraped website URLs and load it.

Step 12: Scrapebox Email Scraper Premium Plugin - Crawl Load List - Urls are now loaded

Step 12: Scrapebox Email Scraper Premium Plugin - Crawl Load List - Urls are now loaded

Step 13: Scrapebox Email Scraper Premium Plugin - Options - Filters and Configuration

Step 13: Scrapebox Email Scraper Premium Plugin - Options - Filters and Configuration

This is one of the most important steps. In order to get emails that are clean and related to our niche, we are going to configure all the filters.

Filter 1: "Search only in pages with" - here, we are going to add all the possible keywords related to our niche. DO NOT use commas, just a space to separate each keyword. The software will scan the HTML code, content, meta titles and meta descriptions of each website for any of these keywords. If a website from our list contains any one of these keywords, the software will extract an email from it. However, if a website does not contain, Scrapebox will skip it. The idea behind this filter is to scrape topically relevant websites. So, in our case, a website that contains the word "CBD" is most likely to be relevant to our business niche.

You could also copy and paste the same keywords into "Search only in pages with" -> "any of the words in the url". So, here, instead of checking the actual page of the website for the keywords, the software will scan the urls for these keywords. I WOULD NOT use this filter because it will narrow the results down too much. And do not forget that a website url will not necessarily have the keywords in it for a lot of niches.

Filter 2: "Ignore Emails when the"

"email username contains" - basically, here we will enter our list of keywords that we do not want in our list.

"email domain contains" - here we will enter our list of keywords that we do not want in our list.

"site is" - here, we will add sites for which we do not want emails. These would usually include the Majestic Million websites.

The idea behind this filter is to get rid of poor quality and spammy emails.

Filter 3: "Grab Emails only when"

"email username contains"

"email domain contains"

Here, we would enter our keywords that we want to appear inside the email. Again, I WOULD NOT use this filter simply because not every email will necessarily have our keywords. Some niches may be more keyword driven than others. For example, compare the fashion industry against the CBD niche. CBD niche has more websites with the actual keyword CBD, whereas, fashion websites are more brand driven so there is no way that we will find many emails with our keywords inside the emails.

"Settings" - "worker threads" - this is just the thread number for this plugin. I like to run it at a 100 as I have a fairly powerful dedicated server.

I do not check the "only scrape emails matching the domain" because a lot of emails will come from business directories and social media sites and there is no way that the scraped emails will match to those domains. Therefore, once again, I like to be over-inclusive.

Use proxies if you have proxies. If not, just run your HMA! VPN PRO in the background.

Step 14: Scrapebox Email Scraper Premium Plugin - Start

Step 14: Scrapebox Email Scraper Premium Plugin - Start

Now run the plugin.

Step 15: Scrapebox Email Scraper Premium Plugin - Send to Email tester

Step 15: Scrapebox Email Scraper Premium Plugin - Send to Email tester

Now send all the scraped emails to the email tester.

Step 16: Scrapebox Email Scraper Premium Plugin - Verify Emails - Options

Step 16: Scrapebox Email Scraper Premium Plugin - Verify Emails - Options

In the "Senders email address" (sic) just enter your email address.

Step 17: Scrapebox Email Scraper Premium Plugin - Verify Emails - Start

Step 17: Scrapebox Email Scraper Premium Plugin - Verify Emails - Start

Hit Start.

Step 18: Scrapebox Email Scraper Premium Plugin - Verify Emails - Filter - Remove Invalid Emails and Untested and Emails with Errors

Step 18: Scrapebox Email Scraper Premium Plugin - Verify Emails - Filter - Remove Invalid Emails and Untested and Emails with Errors

Here, you will remove all the untested, invalid and emails with errors.

Step 19: Scrapebox Email Scraper Premium Plugin - Verify Emails - Export All Valid Emails

Step 19: Scrapebox Email Scraper Premium Plugin - Verify Emails - Export All Valid Emails

Step 20: Scrapebox Email Scraper Premium Plugin - Verify Emails - Export All Valid Emails - Name your File Properly

Step 20: Scrapebox Email Scraper Premium Plugin - Verify Emails - Export All Valid Emails - Name your File Properly

Step 21: Use Atomic Email Verifier to test your emails at Syntax, Domain and Email level

Please read my guide on how to verify your scraped email list using the Atomic Email Verifier here.

Practical Video Demonstration

This is a very basic and working video of how to undertake the entire process I have described above.

 Click here to download the video.


Leave a comment

Comments will be approved before showing up.

Subscribe