GSA Email Spider Tutorial - The Best Email Scraper and Email Sender

March 01, 2019 11 min read

GSA EMAIL SPIDER

GSA Email Spider Tutorial - The Best Email Scraper and Email Sender

GSA Email Spider Tutorial - The Best Email Scraper and Email Sender

GSA Email Spider is one of the best email scrapers and senders. The software can scrape emails from the search engines on the basis of your keywords and then send your email message to the scraped emails.

As you can see, on the main GUI, you have the option to "Use Search Engine". This means that you would be scraping emails on the basis of your keywords.

To add your list of keywords, click on "Load List" and select your notepad .txt file with your keywords. You should have one keyword per line, just like this:

keyword 1

keyword 2

keyword 3

Select the boxes "Parse results for new sublinks (else only for mails)", "Extra check for Keyword(s) and "Check also on sublinks".

2 GSA Email Spider Tutorial - Project options

2 GSA Email Spider Tutorial - Project options

Click on the "Project" button. You can select the following options:

"Load" - this means that you can open up a saved project and continue from where you left off. So for example, if you had to restart your VPS, then you would want to load your saved project and continue from where you left off.

"Save" - once you have configured your project, you should save it. Equally, if you have made some changes to the project, be sure to save it!

 3 GSA Email Spider Tutorial - Start Multiple Projects

 3 GSA Email Spider Tutorial - Start Multiple Projects

"Multiple Projects" - this is a very handy feature that will allow you to run more than one project at the same time. So for example, if you have saved multiple projects, then you can run them all at the same time via this option.

4 GSA Email Spider Tutorial - Import your own URLs and E-Mails

4 GSA Email Spider Tutorial - Import your own URLs and E-Mails

"Import URLs and E-Mails" - this is a very helpful feature that will enable you to import your own URL or e-mail list. The idea behind this option is that it will save you some time from having to scrape e-mails/URLs. I sometimes like to scrape my URL list using scrapebox as it is fairly fast. However, I do not normally import my own e-mails and URLs simply because I like to be getting the freshest results possible from the search engines.

5 GSA Email Spider Tutorial - Options - Program Behaviour

5 GSA Email Spider Tutorial - Options - Program Behaviour

In this section, I usually just select "Parse for E-Mails" because my intention is to scrape emails and then contact them via this software. You could "Parse for Phone Numbers" if you intend to use an SMS sending software. However, I am not using this feature at the moment. Maybe later it would be a good idea.

The option "Not more then [10] E-Mails from a website" just means that you do not want the software to scrape more than 10 emails from the same website. Now, I DO NOT tick this box because some websites such as Instagram contain a lot of relevant and useful emails for my niche and therefore, if I were to select this option, I would significantly restrict my results.

"Analyse JavaScript for protected E-Mails" option should be ticked if you would like for the software to look for emails inside the source code when they are not readily displayed. I usually leave this option unchecked.

The following options should be ticked:

"Analyse Head"

"Analyse Body"

"Accept Cookies"

These options essentially mean that the software will check these places for e-mails.

"Concurrent Connection" - this is the thread number that you want to run. I usually keep this at 50 because I am running my copy of the software on one of the most powerful dedicated servers. In theory, I could ramp up the thread number to 10,000 as the dedicated server could handle it. However, it is not a good idea because the software will become sluggish, crash or stop responding. It is therefore recommended to keep the thread numbers under 100.

"Identify as" option contains all the user agents that basically emulate different devices, operating system, browsers, screen resolutions and so on. I like to "Randomize" these just to keep things looking natural and safe.

I DO NOT select the "Stop work after [ ] minutes" because I do not want for the software to stop working. I want it to work 24/7.

I usually select the "Skip whole domain when no item was found for a long time" and "Detect and remove fake emails (e.g. email produce scripts)". I do not want for the software to waste time on a website: if it has not found anything for a long time, then the odds are that it will not find anything at all. There is no point of wasting time when we could just move on to the next website and extract emails and then send our message to them!

I like to select the option to "Backup Results every 20 Minutes to file results.txt" and "AutoSave project every 5 minutes". For obvious reasons, I want for my project to be auto saved and backed up just in case my dedicated server crashes or has to restart unexpectedly. It is always better to be safe than sorry!

6 GSA Email Spider Tutorial - Options - Filter

6 GSA Email Spider Tutorial - Options - Fil

This is a very important part that will control the quality of the emails and data that the software scrapes! So please pay attention here.

The first box titled "Don’t Include E-Mails with:" just means that you do not want for the software to collect spam, fake or trash emails that will add no value to your work. In fact, such emails will only damage the reputation of your SMTP server and land your domain and IP on a blacklist. I have a list of blacklisted keywords that should not be included it in an email. I have built up this list over many years by analysing a lot of junk emails and collating a list of words, phrases, symbols and characters that they contain. You can download my e-mail blacklist here.

7 GSA Email Spider Tutorial - Options - Filter - How to Add my E-Mail Blacklist to the Don’t Include E-Mails with field

7 GSA Email Spider Tutorial - Options - Filter - How to Add my E-Mail Blacklist to the Don’t Include E-Mails with fiel

To upload it, simply right click inside the "Don’t Include E-Mails with:" box with your mouse and select "Import from Text File". Then select and upload my file. Right click on the same box again and select "Check All" to select all the boxes!

"Don’t parse URLs with:" option is very similar to the previous filter. This box contains a list of all the poor quality, spammy and irrelevant websites that you DO NOT want to scrape. Again, I have compiled a very comprehensive URL blacklist. You can download it here. To upload it, again right click on the "Don’t parse URLs with:" box, select "Import from text file" and then "Check All" to select all the words, symbols and characters.

8 GSA Email Spider Tutorial - Options - Filter - E-Mail and URL Relevancy Filters

8 GSA Email Spider Tutorial - Options - Filter - E-Mail and URL Relevancy Filters

In my opinion, this are the most important filters that will determine the topical relevance of your scraped emails as well as their quality!

The "E-Mail must have:" filter is a list of all the keywords that should be present inside an e-mail. First, determine the most important keywords for your niche. Most of the niches will have the root keywords. For example, the vaping niche would have the keyword vape whilst the cryptocurrency niche would have cryptocurrency as its keyword. However, each niche will have many important keywords. Usually, when performing keyword research, I like to start off with the root keywords, run a Google search, browse through some results and open some websites and take the category keywords and scan them for other important keywords. It is important that you spend some time in creating a quality list of keywords related to your niche.

So here is how you would enter your keywords:

Example - vaping niche

*eliquid*|*e-liquid*|*vap*|*ecig*|*e-cig*|*mod*|*coil*|*cbd*|*hemp*|*shortfill*|*nicotine*|*flavour*|*ohm*|*clouds*|*clop*|*vaporizer*|*ejuice*|*e-juice*|*salt*

You have to enter your keywords in this format and add an asterisk * before and after the keyword. You should separate your keyword with a pipe | . So in the above example, what the filter means is that the software will only scrape emails that contain ANY of the above keywords inside the actual email. The reason why we add an asterisk * before and after the keyword is to capture as many different keyword variations as possible. For example, *vap* could produce the following keywords: vape, vaping, vaper, vaporizer, vapes, vaporz, myvapes, coolvapes and so on.

So just remember to use this format:

*keyword*|*keyword*|*keyword*|*keyword*|*keyword*|*keyword*

Now, the same applies to the "Parse for items only if URL has:". Again, here we are telling the software to parse only the websites that contain any of our keywords.

So we would just copy and enter the same keywords as for the previous filter:

*eliquid*|*e-liquid*|*vap*|*ecig*|*e-cig*|*mod*|*coil*|*cbd*|*hemp*|*shortfill*|*nicotine*|*flavour*|*ohm*|*clouds*|*clop*|*vaporizer*|*ejuice*|*e-juice*|*salt*

9 GSA Email Spider Tutorial - Options - Filter - E-Mail and URL Relevancy Filters - Add or Import from File

9 GSA Email Spider Tutorial - Options - Filter - E-Mail and URL Relevancy Filters - Add or Import from File

You can add your filter by right clicking the "E-Mail must have:" and "Parse for items only if URL has:" and select "Add" or "Import from text file".

The "E-Mail must have:" filter has a check box at the bottom called "Only if same domain as URL". This filter means that the email domain must match the domain of the website. This is meant to enable you to scrape only company emails. I do not use this option because some websites such as Instagram and Facebook will contain a lot of relevant emails and so, the emails from these sites will have non-matching domains.

10 GSA Email Spider Tutorial - Options - Search Engines

10 GSA Email Spider Tutorial - Options - Search Engines

Here, you can select what search engines to scrape. Right click inside the box and just "check all". I like to use all the search engines as they seem to produce more results.

11 GSA Email Spider Tutorial - Options - Keywords

11 GSA Email Spider Tutorial - Options - Keywords

I leave this section at its default settings.

12 GSA Email Spider Tutorial - Options - ExtraData

12 GSA Email Spider Tutorial - Options - ExtraData

This section allows you to select what extra data the software should collect. The extra data is not really that important for scraping and sending of e-mails. It is collected more for your personal reference just in case you want to sort or analyse the scraped results.

You can:

  • "Take page title as extra data"
  • "Take page keywords as extra data"
  • "Take page description as extra data"
  • "Extract data around the found item"
  • "Discover country/city for email (taken from domain)"
  • "Take search keyword(s) as extra data"
  • "Take domain name from found URL"

So, as you can see, this is all very useful data that could accompany your search results.

13 GSA Email Spider Tutorial - Options - Auto Mailer - E-Mail Options

13 GSA Email Spider Tutorial - Options - Auto Mailer - E-Mail Options

This section is very important as you will need to configure everything correctly in order for the software to be able to send e-mails to scraped e-mail addresses.

So here are the fields and explanations of what they mean:

"Your own E-Mail" - this is the email that you are using e.g. yoggy@honeybarrel.co.uk

"Reply-To E-Mail" - this is the email to which you would like to receive replies.

"Send e-mails over your default e-mail client (MAPI)" - I do not use this option as I use my own SMTP server.

"SMTP Server" - enter the name of your smtp server, i.e. cryptonews.info

"Port" enter the port, i.e. 25251

"Login" is your email address and the "Password" is your password.

I check the "Need pop3-login before SMTP" option and in the "Pop3 Server" just enter the Pop3 Server. In my example, it would be cryptonews.info with port 110.

I usually select the "Send delayed option" and enter 5 seconds because I like to warm up my SMTP server. Do not forget that if you are using a new SMTP server, you should go very slow with your email sending to avoid being blacklisted.

I also select "DirectSend if possible" which means that the software will try to send the email directly in case the SMTP server is not working.

I also select the "Use SSL encryption" option.

14 GSA Email Spider Tutorial - Options - Auto Mailer - E-Mail Message(s)

14 GSA Email Spider Tutorial - Options - Auto Mailer - E-Mail Message(s)

This is the section where you can enter the message that the software should send to all the scraped emails.

The software supports spintax and the following variables:

%email% - by entering this anywhere in the text field, the software will display the email address of the recipient. This is helpful in helping you to create more targeted emails that will help you to achieve a much better response rate as the recipients will think that you are writing specifically to them!

%extra% - this will include the extra scraped data that you selected under the "ExtraData" tab.

%url% - this will add the url from which the email was scraped.

%domain% - this will add the domain name of the email. Be careful because mentioning a domain name that is a company domain name is fine but mentioning a Gmail or yahoo or outlook is just random!

{word1#word2#word3} - ok, so this is the spintax format. You should craft your message manually and enter many variations to make each message as unique as possible. The reason why you should manually spin your message is to improve your delivery rate and avoid blacklists. Most email providers and services have very sophisticated security and spam systems that can detect if the same message is being sent to different recipients.

The subject line also accepts spintax format.

Quick tip! I like to use The Spin Rewriter to manually spin my content. The content will be in a slightly different format and will use a pipe | instead of a hash #. So what I do is spin the text manually. Then I should have something like this:

{word1|word2|word3}

Now we need to spin it for GSA Email Spider.

Copy your content to a notepad and click CTRL + H and replace | with # (replace all). Now all of your spun content is ready for use with GSA Email Spider.

15 GSA Email Spider Tutorial - Options - Proxy

15 GSA Email Spider Tutorial - Options - Proxy

It is important to use proxies with this software when scraping and sending emails. Firstly, you do not want for your IP to get banned by the search engines which will reduce your success rate and secondly, you want to remain anonymous when sending out many emails.

I like to use public and rotating proxies.

16 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List

16 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List

Click on "Configure" and then select "Add/Edit Proxy Sites".

17 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Add Edit Proxy Sites - Public Proxies

17 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Add Edit Proxy Sites - Public Proxies

Here, you will be able to select the sources of public proxies. The software already comes with a fully preloaded public proxy source list. You can right click on a proxy source and "select all" if you would like to use all of the proxy sources. It is entirely up to you! I leave all the other settings at their default values.

18 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Options

18 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Options

In this section, you can configure all the proxy settings.

I like to select "Automatically search for new proxies every 200 minutes" because public proxies are not very stable and usually go down very quickly so I like for the software to search for new proxies every 200 minutes.

I select "Text proxies" "All (good/bad)" and "Public/Private". I perform a "Bing" test and select "Remove bad proxies when older than 100 minutes". There simply isn't any point of keeping bad public proxies because they are not going to come back to life!

I also check "Try using Proxy keep-alive (faster if proxy supports it).

I select "Automatically disable public proxies when detected to be down" as these proxies are more or less gone!

I DO NOT select "Automatically disable private proxies when detected to be down" simply because private proxies can come back to life.

The thread number is how fast the proxy scraping and checking will take place. Usually, the thread number is dictated by the specification of your machine.

I also select the box to "Randomize proxies before testing to avoid false positive portscans".

19 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Add Proxy

19 GSA Email Spider Tutorial - Options - Proxy - Configure - Proxy List - Add Proxy

To add your proxies, click on the "Add Proxy" button and select "find online + test" to find and test public proxies from the selected public proxy sources or "import from file (host:port:login:passowrd)" to add your own private or rotating proxies.

20 GSA Email Spider Tutorial - Main Screen - Found Items

20 GSA Email Spider Tutorial - Main Screen - Found Items

Now that you have configured all the options, you can select the "Auto Mailer Enabled" option to automatically send emails as soon as emails are scraped. You will see the status of each email. Just look at the key at the bottom. The pale/sky blue means that your "E-Mail sent (over smtp)", green means that your email was "sent before" and the sand colour means that the email was "sent (direct)" - remember to enable the direct sending option in the settings.

 

 

 

 

 

 


Leave a comment

Comments will be approved before showing up.

Subscribe