Tuesday, April 8, 2014

9:51 PM
2

Good day!

Recently Ive been coding web crawler in Python as an integration for my projects. Just wanted to share with you this little script I made as a blueprint. I will post it on github for improvements, feel free to look and download it.

Web Crawlers

 

What is a web crawler? We can start with the simplest definition of a web crawler. It is a program that, starting from a single web page, moves from web page to web page by only using urls that are given in each page, starting with only those provided in the original page. This is how search engines like Google obtain the content they need for their search sites.

This tool is somewhat categorized as Email harvesting where is the process of obtaining lists of email addresses using various methods for use in bulk email or other purposes usually grouped as spam.

A is a very basic email web crawler in python and hope it aids you some way.

The logic here is very straightforward:
You enter a 3 parameters ie [domain], [dork] and [subdomain]

See screenshot on how to use:

pymaildumper.py

And here's a sample result if you successfully executed the script.


For further info about web crawlers and email harvesting logic please see this wikis: Web Crawlers Email Harvesting 

If you have suggestions or ideas you can always drop me a message. You can get this tool on github, here's the link: 


Hope this script gives you idea on web crawlers and search engine scripting.
Thank you and have a good day!


2 comments:

  1. So you enter a 3 parameters ie [domain], [dork] and [subdomain] and you get what types of results... I've seen you get a lot of email... what do those emails mean?

    ReplyDelete
    Replies
    1. Hello Adrian,

      Thank you for your comments.
      Basically it is a very simple tool that is intended to help Penetration testers to test in order to understand the customer footprint on the Internet. It is very useful to know and allows us to quickly gather email-addresses to check also what an attacker can see on the organization so we can mitigate information gathering.

      Thanks

      Delete