Herramienta de investigación de sitios web que muestra información sobre dominios (como WHOIS, IP, backlinks, etc.)

Buscando una herramienta que pueda hacer lo siguiente:

  • recibir la entrada de un nombre de dominio (o nombre de host como shop.example.com)
  • producir un informe para el dominio/host con información como
    • QUIÉN ES
    • IP, propietario de IP, sitios en la misma IP
    • verificar contra la lista negra de spam
    • verificar contra seguridad, lista de malware
    • enlace a los registros de Internet Archive
    • lista de backlinks al host
    • puntuación de popularidad del sitio común, como Google PageRank, Alexa Traffic Rank, etc.

El programa debe obtener toda la información anterior de las diversas fuentes sin necesidad de ninguna acción del usuario o con muy poca.

El programa debe ser capaz de combinar toda la información en un solo informe. El informe puede ser en texto, doc o html.

El tiempo necesario para generar el informe no es tan importante. Sin embargo, si está disponible una estimación del tiempo para generar un informe para la herramienta, no dude en mencionarlo.

Gratuito o de pago está bien.

Se prefiere un programa de GUI de Windows, pero también se acepta un servicio alojado.

Parece que muchos sitios web brindan este tipo de información, describa por qué algunos de los más comunes no son suficientes para usted, para que podamos ayudarlo mejor.
¿Podría por favor explicar algunas cosas: cómo quiere el informe? ¿A qué te refieres con PageRank? ¿Qué quieres decir con Alexa? (¿su número de rango tal vez?) ¿Qué quiere decir con Internet Archive? (¿Cuántas instantáneas tiene tal vez?) ¿Qué quiere decir con vínculos de retroceso? (¿Número de vínculos de retroceso?)
@NicolasRaoul, los sitios que parezco proporcionan principalmente una o parte de la información. No he encontrado un sitio web que pueda combinar toda la información en 1 paso y producir un informe. Si conoce un sitio de este tipo, tenga la amabilidad de compartir la respuesta.
@Janekmuric, pregunta actualizada en función de su aclaración.
@NicolasRaoul Solo una pregunta: ¿Cuál es el tiempo máximo que llevaría generar un informe?
@Janekmuric: ¿Yo? ¿Querías mencionar al autor de la pregunta? :-)
Sí, quería preguntarle a @kenchew
@Janekmuric actualizó la pregunta para incluir el tiempo.

Respuestas (1)

No he encontrado ningún sitio web o programa que pueda hacer lo que necesitas, así que en lugar de aprender Geografía, hice un programa Python 2.7 para que lo hiciera por ti.

Muchas gracias a jgamblin en Github por su script de buscador de lista negra de fuente abierta.

from pprint import pprint, pformat
from sys import exit, argv
from socket import gethostbyname, gethostbyaddr
from os.path import exists, isdir
from json import dumps
from time import strftime, time
from re import findall
from urllib2 import urlopen, Request, build_opener
from bs4 import BeautifulSoup as bs
import dns.resolver
from ipwhois import IPWhois

# Code below downloaded from GitHub page:
# https://github.com/jgamblin/isthisipbad/blob/master/isthisipbad.py
# Modified by Janek to fix errors.

def content_test(url, badip):
    """
    Test the content of url's response to see if it contains badip.
        Args:
            url -- the URL to request data from
            badip -- the IP address in question
        Returns:
            Boolean
    """

    try:
        request = Request(url)
        html_content = build_opener().open(request).read()

        matches = findall(badip, html_content)

        return len(matches) == 0
    except Exception, e:
        return False

bls = ["b.barracudacentral.org", "bl.spamcannibal.org", "bl.spamcop.net",
       "blacklist.woody.ch", "cbl.abuseat.org", "cdl.anti-spam.org.cn",
       "combined.abuse.ch", "combined.rbl.msrbl.net", "db.wpbl.info",
       "dnsbl-1.uceprotect.net", "dnsbl-2.uceprotect.net",
       "dnsbl-3.uceprotect.net", "dnsbl.cyberlogic.net",
       "dnsbl.sorbs.net", "drone.abuse.ch", "drone.abuse.ch",
       "duinv.aupads.org", "dul.dnsbl.sorbs.net", "dul.ru",
       "dyna.spamrats.com", "dynip.rothen.com",
       "http.dnsbl.sorbs.net", "images.rbl.msrbl.net",
       "ips.backscatterer.org", "ix.dnsbl.manitu.net",
       "korea.services.net", "misc.dnsbl.sorbs.net",
       "noptr.spamrats.com", "ohps.dnsbl.net.au", "omrs.dnsbl.net.au",
       "orvedb.aupads.org", "osps.dnsbl.net.au", "osrs.dnsbl.net.au",
       "owfs.dnsbl.net.au", "pbl.spamhaus.org", "phishing.rbl.msrbl.net",
       "probes.dnsbl.net.au", "proxy.bl.gweep.ca", "rbl.interserver.net",
       "rdts.dnsbl.net.au", "relays.bl.gweep.ca", "relays.nether.net",
       "residential.block.transip.nl", "ricn.dnsbl.net.au",
       "rmst.dnsbl.net.au", "smtp.dnsbl.sorbs.net",
       "socks.dnsbl.sorbs.net", "spam.abuse.ch", "spam.dnsbl.sorbs.net",
       "spam.rbl.msrbl.net", "spam.spamrats.com", "spamrbl.imp.ch",
       "t3direct.dnsbl.net.au", "tor.dnsbl.sectoor.de",
       "torserver.tor.dnsbl.sectoor.de", "ubl.lashback.com",
       "ubl.unsubscore.com", "virus.rbl.jp", "virus.rbl.msrbl.net",
       "web.dnsbl.sorbs.net", "wormrbl.imp.ch", "xbl.spamhaus.org",
       "zen.spamhaus.org", "zombie.dnsbl.sorbs.net"]

URLS = [
    #TOR
    ('http://torstatus.blutmagie.de/ip_list_exit.php/Tor_ip_list_EXIT.csv',
     'is not a TOR Exit Node',
     'is a TOR Exit Node',
     False),

    #EmergingThreats
    ('http://rules.emergingthreats.net/blockrules/compromised-ips.txt',
     'is not listed on EmergingThreats',
     'is listed on EmergingThreats',
     True),

    #AlienVault
    ('http://reputation.alienvault.com/reputation.data',
     'is not listed on AlienVault',
     'is listed on AlienVault',
     True),

    #BlocklistDE
    ('http://www.blocklist.de/lists/bruteforcelogin.txt',
     'is not listed on BlocklistDE',
     'is listed on BlocklistDE',
     True),

    #Dragon Research Group - SSH
    ('http://dragonresearchgroup.org/insight/sshpwauth.txt',
     'is not listed on Dragon Research Group - SSH',
     'is listed on Dragon Research Group - SSH',
     True),

    #Dragon Research Group - VNC
    ('http://dragonresearchgroup.org/insight/vncprobe.txt',
     'is not listed on Dragon Research Group - VNC',
     'is listed on Dragon Research Group - VNC',
     True),

    #OpenBLock
    ('http://www.openbl.org/lists/date_all.txt',
     'is not listed on OpenBlock',
     'is listed on OpenBlock',
     True),

    #NoThinkMalware
    ('http://www.nothink.org/blacklist/blacklist_malware_http.txt',
     'is not listed on NoThink Malware',
     'is listed on NoThink Malware',
     True),

    #NoThinkSSH
    ('http://www.nothink.org/blacklist/blacklist_ssh_all.txt',
     'is not listed on NoThink SSH',
     'is listed on NoThink SSH',
     True),

    #Feodo
    ('http://rules.emergingthreats.net/blockrules/compromised-ips.txt',
     'is not listed on Feodo',
     'is listed on Feodo',
     True),

    #antispam.imp.ch
    ('http://antispam.imp.ch/spamlist',
     'is not listed on antispam.imp.ch',
     'is listed on antispam.imp.ch',
     True),

    #dshield
    ('http://www.dshield.org/ipsascii.html?limit=10000',
     'is not listed on dshield',
     'is listed on dshield',
     True),

    #malc0de
    ('http://malc0de.com/bl/IP_Blacklist.txt',
     'is not listed on malc0de',
     'is listed on malc0de',
     True),

    #MalWareBytes
    ('http://hosts-file.net/rss.asp',
     'is not listed on MalWareBytes',
     'is listed on MalWareBytes',
     True)]

def blacklist(badip):
    BAD = 0
    GOOD = 0

    for url, succ, fail, mal in URLS:
        test = content_test(url, badip)
        if test == True:
            GOOD = GOOD + 1
        elif test == False:
            BAD = BAD + 1

    BAD = BAD
    GOOD = GOOD

    for bl in bls:
        try:
                my_resolver = dns.resolver.Resolver()
                query = '.'.join(reversed(str(badip).split("."))) + "." + bl
                my_resolver.timeout = 5
                my_resolver.lifetime = 5
                answers = my_resolver.query(query, "A")
                answer_txt = my_resolver.query(query, "TXT")
                BAD = BAD + 1

        except dns.resolver.NXDOMAIN:
            GOOD = GOOD + 1

        except dns.resolver.Timeout:
            pass

        except dns.resolver.NoNameservers:
            pass

        except dns.resolver.NoAnswer:
           pass

    return str(BAD) + "/" + str(BAD+GOOD)

# Code ABOVE downloaded from GitHub page:
# https://github.com/jgamblin/isthisipbad/blob/master/isthisipbad.py
# Modified by Janek to fix errors.  

def get_rank(domain_to_query):
    result = {'Global':'', "Country":''}
    url = "http://www.alexa.com/siteinfo/" + domain_to_query
    page = urlopen(url).read()
    soup = bs(page, "html.parser")
    for span in soup.find_all('span'):
        if span.has_attr("class"):
            if "globleRank" in span["class"]:
                for strong in span.find_all("strong"):
                    if strong.has_attr("class"):
                        if "metrics-data" in strong["class"]:
                            result['Global'] = strong.text.replace("\n", "").replace(" ", "")
            if "countryRank" in span["class"]:
                image = span.find_all("img")
                for img in image:
                    if img.has_attr("title"):
                        country = img["title"]
                for strong in span.find_all("strong"):
                    if strong.has_attr("class"):
                        if "metrics-data" in strong["class"]:
                            result["Country"] = country + ": " + strong.text.replace("\n", "").replace(" ", "")
    return result

def parseData(ip, data, whdata, blk, rank, tim, iphost):
    whois = whdata["nets"]
    dnet = data["network"]
    ob = data["objects"]
    curtime = strftime("%d %B %Y %H:%M:%S")

    if curtime[0] == "0":
        curtime = curtime[1:]

    timee = str(tim).split(".")[0] + "." + str(tim).split(".")[1][:2]
    outStr = "WARNING: Data below may be inaccurate\nTarget: " + ip + "\nGenerated: " + curtime + "\nTime took to generate: " + timee + " seconds" + "\n\n"

    outStr += "IP host: " + iphost + "\n\n"

    outStr += "Blacklist: " + blk + "\n\n"
    outStr += "Archive: http://web.archive.org/web/*/" + iphost + "\n"
    outStr += "Global Alexa rank: " + rank["Global"] + "\n"
    outStr += "Country Alexa rank: "+ rank["Country"] + "\n\n"


    outStr += "Legacy Whois:\n"
    net = 1
    for i in whois:
        outStr += "  Network " + str(net) + ":\n"
        try:
            outStr += "    IP: " + str(whdata["query"]) + "\n"
        except:
            outStr += "    IP: Not found\n"
        try:
            outStr += "    Name: " + str(i["name"]) + "\n"
        except:
            outStr += "    Name: Not found\n"
        try:
            outStr += "    Abuse E-mails: " + str(i["abuse_emails"]) + "\n"
        except:
            outStr += "    Abuse E-mails: Not found\n"
        try:
            outStr += "    Adress: " + str(i["adress"]) + "\n"
        except:
            outStr += "    Adress: Not found\n"
        try:
            outStr += "    Country: " + str(i["country"]) + "\n"
        except:
            outStr += "    Country: Not found\n"
        try:
            outStr += "    City: " + str(i["city"]) + "\n"
        except:
            outStr += "    City: Not found\n"
        try:
            outStr += "    Postal code: " + str(i["postal_code"]) + "\n"
        except:
            outStr += "    Postal code: Not found\n"
        try:
            outStr += "    Created: " + str(i["created"]) + "\n"
        except:
            outStr += "    Created: Not found\n"
        try:
            outStr += "    Description: " + str(i["description"]) + "\n"
        except:
            outStr += "    Description: Not found\n"
        try:
            outStr += "    Handle: " + str(i["handle"]) + "\n"
        except:
            outStr += "    Handle: Not found\n"
        try:
            outStr += "    Misc E-mails: " + str(i["misc_emails"]) + "\n"
        except:
            outStr += "    Misc E-mails: Not found\n"
        try:
            outStr += "    IP range: " + str(i["range"]) + "\n"
        except:
            outStr += "    IP range: Not found\n"
        try:
            outStr += "    State: " + str(i["state"]) + "\n"
        except:
            outStr += "    State: Not found\n"
        try:
            outStr += "    Tech E-mails: " + str(i["tech_emails"]) + "\n"
        except:
            outStr += "    Tech E-mails: Not found\n"
        try:
            outStr += "    Updated: " + str(i["updated"]) + "\n\n"
        except:
            outStr += "    Updated: Not found\n\n"

        net += 1

    outStr += "\nRDAP (HTTP) Whois:\n"

    try:
        outStr += "  Name: " + str(dnet["name"]) + "\n"
    except:
        outStr += "  Name: Not found\n"
    try:
        outStr += "  Start adress: " + str(dnet["start_adress"]) + "\n"
    except:
        outStr += "  Start adress: Not found\n"
    try:
        outStr += "  End adress: " + str(dnet["end_adress"]) + "\n"
    except:
        outStr += "  End adress: Not found\n"
    try:
        outStr += "  IP verion: " + str(dnet["ip_versoin"]) + "\n"
    except:
        outStr += "  IP version: Not found\n"

    outStr += "  Events:\n"

    e = 1
    for i in dnet["events"]:
        outStr += "    Event " + str(e) + ":\n"
        try:
            outStr += "      Action: " + str(i["action"]) + "\n"
        except:
            outStr += "      Action: Not found\n"
        try:
            outStr += "      Actor: " + str(i["actor"]) + "\n"
        except:
            outStr += "      Actor: Not found\n"
        try:
            outStr += "      Timestamp: " + str(i["timestamp"]) + "\n"
        except:
            outStr += "      Timestamp: Not found\n"

        e += 1

    outStr += "\n  Objects:\n"
    for i in ob:
        z = ob[i]["contact"]
        outStr += "    " + str(i) + ":\n"
        try:
            outStr += "      Name: " + str(z["name"]) + "\n"
        except:
            outStr += "      Name: Not found\n"
        try:
            outStr += "      E-mail: " + str(z["email"][0]["value"]) + "\n"
        except:
            outStr += "      E-mail: Not found\n"
        try:
            outStr += "      Kind: " + str(z["kind"]) + "\n"
        except:
            outStr += "      Kind: Not found\n"
        try:
            outStr += "      Phone: " + str(z["phone"][0]["value"]) + "\n"
        except:
            outStr += "      Phone: Not found\n"
        try:
            outStr += "      Title: " + str(z["title"]) + "\n"

        except:
            outStr += "      Title: Not found\n"
        try:
            outStr += "      Links: "
        except:
            outStr += "      Links: Not found\n"

        if ob[i]["links"]:
            if not len(ob[i]["links"]) == 0:
                for j in ob[i]["links"]:
                    outStr += str(j).replace("\n", " ")
                    outStr += "  "
        else:
            outStr += "Not found"

        outStr += "\n      Contact: "
        if z["address"]:
            for j in z["address"]:
                outStr += str(j["value"].replace("\n", ", "))

        else:
            outStr += "Not found"
        outStr += "\n\n"
    return outStr

def getData(ip):
    obj = IPWhois(ip)
    results = obj.lookup_rdap(depth=1)
    return results

def whgetData(ip):
    obj = IPWhois(ip)
    results = obj.lookup()
    return results

def showHelp():
    name = argv[0].replace("\\","/").split("/")[-1]
    print("""
Simple website report (Whois) generator
Usage:
    %s website [-json]
            [-file <file>]
            [-raw]

Arguments:
    website       Website IPv4 or domain
    -json         Convert output to json 
    -file         Save output to a file
    -raw          Output full Whois lookup in raw format

""" % name)

if __name__ == "__main__":
    args = argv

    if len(args) < 2:
        print("Too few arguments!")
        showHelp()
        exit(1)

    if len(args) > 5:
        print("Too many arguments!")
        showHelp()
        exit(1)

    allowedArgs = ["-file", "-json", "-raw"]
    noLast = ["-file"]

    b = 2
    for i in args[2:]:
        if i not in allowedArgs and args[b-1] not in noLast:
            print("Invalid option (%s)" % i)
            showHelp()
            exit(1)
        b += 1

    for i in args[2:]:
        if args.count(i) > 1 and i in allowedArgs:
            print("Option appearing more then once (%s)" % i)
            showHelp()
            exit(1)

    if args[-1] in noLast:
        print("Option has no arguments (%s)" % args[-1])
        showHelp()
        exit(1)

    if "-json" in args and "-raw" in args:
        print("Can't use -json and -raw in the same time")
        showHelp()
        exit(1)

    rawIP = args[1]
    if rawIP.replace(".","").isdigit() == True:
        ip = rawIP
    else:
        ip = gethostbyname(rawIP)

    if "-file" in args:
        filename = args[args.index("-file") + 1]
        if exists(filename) == True:
            print("File already exists (%s)" % filename) 
            showHelp()
            exit(1)

        directory = "/".join(filename.replace("\\","/").split("/")[0:-1])

        if len(directory) > 0 and isdir(directory) == False:
            print("Directory does not exist (%s)" % directory) 
            showHelp()
            exit(1)

        fileOn = True
    else:
        fileOn = False

    t1 = time()
    data = getData(ip)
    whdata = whgetData(ip)
    blk = blacklist(ip)

    if rawIP.replace(".","").isdigit() == True:
        try:
            iphost = gethostbyaddr(ip)
        except:
            iphost = "Not found"
    else:
        iphost = rawIP

    rank = get_rank(iphost)
    t2 = time()

    t = t2-t1
    if "-file" not in args:
        if "-json" in args:
            all = {"http_whois":data, "legacy_whois":whdata, "blacklist":blk, "rank":rank, "time":t}
            print(dumps(all, ensure_ascii=False) + "\n")
            exit(0)

        elif "-raw" in args:
            all = {"http_whois":data, "legacy_whois":whdata, "blacklist":blk, "rank":rank, "time":t}
            pprint(all)
            exit(0)

        else:
            parsed = parseData(ip, data, whdata, blk, rank, t, iphost)
            print(parsed)
            exit(0)

    else:
        if "-json" in args:
            all = {"http_whois":data, "legacy_whois":whdata, "blacklist":blk, "rank":rank, "time":t}
            forWrite = dumps(all, ensure_ascii=False) + "\n"
            f = open(filename,"w")
            f.write(forWrite)
            f.close()
            print("File created!")
            exit(0)

        elif "-raw" in args:
            all = {"http_whois":data, "legacy_whois":whdata, "blacklist":blk, "rank":rank, "time":t}
            forWrite = pformat(all)
            f = open(filename,"w")
            f.write(forWrite)
            f.close()
            print("File created!")
            exit(0)

        else:
            parsed = parseData(ip, data, whdata, blk, rank, t, iphost)
            forWrite = parsed
            f = open(filename,"w")
            f.write(forWrite)
            f.close()
            print("File created!")
            exit(0)

Qué hacer:

  1. Descargar Python 2.7.x
  2. Instálelo (marque la casilla de verificación "Agregar python.exe a la ruta" al instalar)
  3. Copie y pegue el código anterior en un archivo llamadoyournamehere.py
  4. Ahora puede ejecutar el programa cmdcon la siguiente consulta:

python yournamehere.py www.google.com

NOTA: Debe instalar las siguientes bibliotecas para que el código funcione dnspython, bs4,ipwhois

Para instalarlos, vaya a cmd y escriba línea por línea:

pip install dnspython
pip install ipwhois
pip install bs4

Ayudar:

Simple website report (Whois) generator
Usage:
        netreport.py website [-json]
                        [-file <file>]
                        [-raw]

Arguments:
        website           Website IPv4 or domain
        -json             Convert output to json
        -file             Save output to a file
        -raw          Output full Whois lookup in raw format

Si desea formatear el informe completo de salida en un diccionario Pythonic, use `-raw.

Si desea formatear la salida en json, use "-json".

-rawy -jsonno se puede usar al mismo tiempo.

Si desea guardar la salida en un archivo, use-file

Ejemplos:

python yournamehere.py www.google.com
python yournamehere.py www.google.com -json
python yournamehere.py www.google.com -file myreport.txt
python yournamehere.py www.google.com -raw -file myfile.txt
python yournamehere.py www.google.com -file C:\Users\Jan\Desktop\file.txt

Velocidad:

es bastante lento Porque www.google.comlleva ~30 segundos generar el informe. Mi Internet también es lento: 4 Mbps de bajada, 0,5 Mbps de subida (sí, muy lento)

No es exactamente lo que necesito, pero definitivamente merezco un voto positivo por el (¡enorme!) esfuerzo para armar el código y la respuesta. ¡Gracias!
Bueno, eso es todo lo que tengo. De todos modos, no estoy seguro de lo que está buscando porque el programa hace todo lo que está en su lista además de los vínculos de retroceso. Tengo que aprender Geografía ahora. (y este es el mejor sitio web que pude encontrar para hacer lo que necesita, pero le falta mucha información centralops.net/co/DomainDossier.aspx )
Gracias por tomarse el tiempo para armar la respuesta. Muy apreciado.