Check a Website’s Reputation with Website Categorization API and Other Tools
In an era riddled with highly skilled threat actors and sophisticated attack methods, determining whether you can safely access a website or not is critical. After all, the only certain way of preventing a breach is to keep well away from every potential threat source online, including anywhere inappropriate you may land on the Web.
With that purpose in mind, this long-form article presents four case studies on how Website Categorization API, in conjunction with other domain and IP data feeds and APIs, can help organizations avoid the pitfalls that come with visiting harmful web pages.
Table of Contents
- Check Website Trust or Land in a Sea of Trouble
- Case Studies
- Bottom line: Check Website Reputation with the Right Intelligence Sources
Pro tip: Here is an immediate way to check if a website belongs to reputable categories:
Check Website Trust or Land in a Sea of Trouble
Many system infections result from the simple act of visiting a compromised or outright malicious website. While some sites have been specially crafted to host malware or exploit that automatically get dropped onto vulnerable systems, others are legitimate but have been under the control of hackers and/or used in attacks. Whatever the case may be, these malware- or exploit-laden sites often employ the same tactic to rein in victims — a drive-by download.
So how does this work? What happens once a piece of malware makes your computer its new home? Cyber attacks that rely on drive-by downloads often use these elements:
- Entry point: Attackers create credible sites or hijack popular ones to act as malware hosts. These pages are designed to silently drop a piece of malware onto unsuspecting visitors’ systems.
- Distribution and exploration: The malware that initially ends up on a user’s computer is designed to pinpoint exploitable vulnerabilities on it, its apps, and the devices connected to it. You can think of it as an added reconnaissance tool to reach the attacker’s end goal.
- Exploitation and infection: After a comprehensive diagnosis of the infected system, the initial malware identifies the exploit that would work best on it. Attackers typically have commercially available (from underground marketplaces or the Deep Web) exploit kits in their attack arsenals. From there, cybercriminals choose what would run on a victim’s computer and drop this onto it to continue the attack. In turn, the attackers gain control of one or more devices and so initiate the loss of data and related breach.
- Infrastructure hacking: Every attack designed to siphon off confidential data from an infected system makes use of a command-and-control (C&C) server owned and controlled by the attackers. In a ransomware attack, for instance, the C&C server issues commands to the actual payload such as “look for files with the .doc extension, copy then, and send the copies back.”
What can companies do to prevent such malicious incidents from happening? While it may be impossible to tell if a website is a potential threat carrier at first glance, avoiding those that have been identified as unsafe to visit is highly recommended. Blacklisting sites so even the most reckless employees won’t end up visiting them is good practice too.
Additionally, the aid of a Website Categorization API and other tools configured to adhere to security policies can help organizations safeguard their digital assets against e-commerce formjacking, phishing, and ransomware attacks. Let’s see how with the following use cases.
Case Study 1: E-Commerce (Magecart)
Before digging into this first case study, let’s take a look at what website categorization entails. In short, website classification is an easy way for businesses to get to know their customers as well as to flag inconsistencies and potential cases of fraud.
In fact, in three simple steps, users would already know more about a specific customer or potential threat source. For instance:
- Log in to https://website-categorization-api.whoisxmlapi.com.
- Click Give API a Try. You should land on the desired section.
- Type the domain name into the “Get website categorization” input field and hit the Enter key. All the Interactive Advertising Bureau (IAB)-approved categories (along with their corresponding identification numbers and corresponding confidence level) the site belongs to should appear. Up to two tiers or subcategories also appear under each category.
The categories shown for instagram[.]com are selected from among the 500+ categories the IAB has identified and used to date.
A full list of the website categories can be obtained from the “Get all categories” section of our documentation page. Note that the higher the confidence level for an identified category, the more likely the website falls under it.
That said, for our example instagram[.]com, we could say that it is more of a Business and Finance and Shopping site (0.885 confidence level) than a Retail industry page (0.550 confidence level). Note the similarities among the categories identified for Instagram. Business and finance and shopping are broader categories compared with retail. Sales, after all, has two major types—wholesale and retail. If you observe the products sold on the site, you would see that not many are offered in bulk (wholesale), thus making Instagram more likely to fall under retail industry rather than wholesale.
So what do you do this with data? Pooling the customers’ domains into categories can help sales and marketing teams identify which industries to prioritize. They can then come up with informed strategies that would yield a more significant profit margin for their companies.
Not all site visitors are prospective customers, however. And neither should everyone be welcomed with open arms. If you’re a cybersecurity specialist, you already know that businesses must be wary of threat actors that want to gain access to their networks and prey on their customers. This is illustrated by one threat in particular — e-commerce formjacking.
E-commerce formjacking, which we explored by using Magecart attacks, involves implanting malicious code into the forms that online buyers fill in when placing their orders on online shops. This code allows attackers to steal users’ credit card information as they input it into the checkout page.
Importantly, Magecart refers to an attack category, that is, e-commerce formjacking, and not a specific organization or entity. Several cybercriminal groups have used Magecart in high-profile attacks.
The latest report on Magecart incidents revealed that more than 17,000 domains fell prey to the threat. Even worse, experts say they see no end to the attacks any time soon. The only course of action left for businesses then is to beef up the security of their e-commerce sites.
Organizations that want to safeguard their infrastructure and customers can use their website categorization findings with reports that give out indicators of compromise (IoCs) to identify unwanted site visitors. URL blacklisting, in the event of a potential e-commerce formjacking attempt, would help users safeguard their digital properties and customer data from malicious individuals.
Case Study #2: Brand Protection (PayPal)
Web categorization is a worthwhile endeavor when it comes to brand protection. For any business to succeed, its brand always has to be reputable. Succumbing to a cyber attack can leave a lasting negative impression on a company’s existing and potential customers.
Phishing, for example, is an age-old but ever-reliable threat that remains one of the most significant challenges even for today’s biggest brands. While less-known targets also struggle with it, it isn’t surprising that the most popular vendors comprise phishers’ list of go-to targets. There’s a straightforward reason for that—the bigger the brand, the wider the potential victim pool and profit margin is for attackers.
If you’re wondering about the prevalence of phishing in actual numbers, the 2019 year-end Anti-Phishing Working Group (APWG) Phishing Activity Trends Report showed that the year was a rollercoaster ride for phishing. The third quarter (July–October) proved the most productive for phishers in the past three years. Verizon’s 2019 Data Breach Investigations Report (DBIR) identified phishing as the leading data breach attack vector. These are alarming trends, but the danger can usually be avoided by employing tools that add an extra layer of defense against the threat.
Like in the e-commerce case study, users can rely on Website Categorization API to determine if a potential client, for instance, is worth trusting or should potentially be flagged as a “phisher.” A payment processor like PayPal can follow the same three steps to verify if a user’s claims are valid. It can search for the client’s domain to check if it corresponds to the email sender’s supposed company category. Calling the organization for confirmation, of course, enhances the verification process.
Let’s take a look at a hypothetical scenario. Say that John Smith wishes to sign up for a PayPal account to start his new business. He claims to have recently put up an online shopping site that sells sports apparel called “Sports Store.” (IMPORTANT NOTE: We used a randomly chosen domain name for this scenario. That domain name is not malicious.)
The PayPal representative in charge of John Smith’s account registration can look up sports-store[.]com (the domain Smith gave) on Website Categorization Lookup to verify its existence. To do that, type the domain name into the “Get website categorization” input field and hit the Enter key. A list of the categories the customer’s site belongs to should appear.
For our made-up scenario, the results do not necessarily confirm Smith’s claims, as the IAB category Apparel Industry (IAB-113) does not appear as a result. You may need to employ further Internet research to verify his claim by visiting the said site and seeing if it is indeed an online shopping site for sports apparel. Our search for the domain using Screenshot Service (a precautionary measure just in case the domain is malicious and accessing it could lead to malware infection) shows this:
A result like that makes Smith’s claims about putting up a sports goods shopping site less credible since the domain he gave as proof is actually parked and up for sale. If you do happen to land on an active site, you can check for visible signs of credibility. For instance:
- The URL should start with HTTPS instead of the usual HTTP. The additional S at the end means the site is encrypted and is thus harder to compromise than one that isn’t. A lock icon preceding the URL also indicates the website security.
- Every reputable company provides accurate contact information on its website. You can check these out by emailing or calling them.
- Some vendors even go the extra mile and have their sites verified by certified authorities to guarantee that they look out for their customers’ welfare.
- No reputable vendor site has third-party ads that offer things for free. Even an accidental hover or click on a malicious ad can lead to a drive-by download. This demonstrates the importance of paying attention to third-party risks.
In light of these approaches of checking a website reputation, financial service providers like PayPal can support or reject an applicant’s account registration. These are just some of the ways in which they can avoid being abused by a potential phisher.
Cybersecurity professionals can also opt for specially designed tools to check the validity of a site’s Secure Sockets Layer (SSL) certificate. All reputable vendors’ sites have valid SSL certificates which digitally bind a cryptographic key to their organizations.
In addition to using Website Categorization API, companies can add monitoring for newly registered domains (NRDs) to their regular security checks. It has been found, after all, that 70% of NRDs are unsafe to work with and can be malicious in some cases.
To demonstrate, we downloaded the .com NRD data feeds for 22 January–4 February 2021. These contain bulk-registered .com domains that made their way into the Domain Name System (DNS) on the said dates. We sought to determine if new supposed PayPal domains were recently put up.
Our inquiry resulted in a list of 326 PayPal NRDs, only one of which (i.e., paypal-service-support[.]com) could be publicly attributed to PayPal since it shares the company’s official organization name (i.e., PayPal, Inc.) in its WHOIS record as per our bulk WHOIS lookup results. There could be others as we only used the string “paypal.” We did not include the brand name’s misspellings (e.g., paypǝl) and spelling variations (e.g., pay-pal). Given that, we can still assume that a few dozens PayPal look-alike domains are registered each day on average.
Finally, when complemented by brand protection and monitoring tools, IT security teams and employees can avoid dealings with malicious individuals and engage in unintended interactions that may have severe repercussions for their reputation.
Case Study #3: Cybersecurity Against Ransomware (Businesses in Major U.S. Cities)
We’ve seen many organizations worldwide lose massive amounts of data and incur huge financial losses after suffering from ransomware attack. A Florida city is likely to be holding the record for shelling out the biggest ransomware payout amounting to US$600,000 to date. It gave in to the hackers’ demand when it lost access to all its records and when its email system was disabled. What’s more, it had to resort to paying employees and vendors by check, and its 911 dispatchers were left unable to pass on calls to the responders.
Another example of a ransomware attack involved Maersk, a global container shipping giant, in one of the most prominent casualties of the NotPetya outbreak in June 2017. The company reportedly lost an estimated US$300 million due to the attack that resulted in a severe business interruption across 600 of its sites located in 130 countries.
Ransomware isn’t just a problem for large enterprises, though. Small and medium-sized businesses (SMBs) are also prone to attack. Take the case of a local medical service provider in Michigan, for instance, whose owners preferred to close shop rather than deal with an attack’s aftereffects. Not paying the ransom is justifiable as those who opt to sometimes end up with nothing but a gaping hole in their bank accounts.
These aren’t isolated cases. Security experts believe that the ransomware damage could reach as much as US$11.5 billion this year. Reasons for this include an expected rise in attack frequency and code innovations.
In light of recently reported events in major cities across the U.S. alone, we’re bound to see the prediction come true. Local city halls, public libraries, and other government offices in Dallas, Baltimore, Albany, and Laredo in Texas and Lake City, Florida were just some of the recent victims.
As with any online threat, one way of countering the ill effects of a ransomware attack lies in identifying risky sites. To do that, organizations can determine where their site visitors originate from. Website categorization and IP geolocation can work hand in hand to enable that. For instance, a website categorization API can help assess if anyone under a specific domain (and its related categories) has a legitimate reason to be accessing the company website.
Let’s say for demonstration purposes that you work as an IT security personnel for the Dallas Public Library. Because of the recent spate of attacks against similar institutions, you decided to sift through your network’s traffic logs. While you were at it, you discover a suspicious domain, such as iiheartradio[.]com, that keeps trying to access your network. (IMPORTANT NOTE: The domain in this scenario is not related to the attacks discussed in this section. It points to a randomly selected suspicious website.)
A Website Categorization API lookup using the domain as a search term should give you this result:
Despite the presence of the string “radio” in the domain, it is surprising that the categories the site falls under do not include Entertainment Industry (IAB-93) or Media Industry (IAB-106). You should also probably wonder why it was classified under Real Estate. It may be a good idea to dig deeper into the site using a screenshot lookup tool to avoid possible malware infection since the site is dubbed “suspicious” on VirusTotal.
To avoid sites that host ransomware, you can monitor blacklists, such as the Abuse.ch Ransomware Tracker Blocklist. To more safely navigate the Web, including all sites in the said blocklist in your organization’s own URL blacklist may be a good practice. That way, you can avoid landing on a known ransomware-laden site.
In the case of the U.S. cities, it may be a good idea to spot inconsistencies between say what’s claimed in an email and the information contained in IP addresses. For that, you can use an IP geolocation tool.
For example, let’s say that your company received an email with a suspicious attachment that claims to originate from a partner in Canada. Keep in mind that a lot of ransomware variations could come in the guise of documents. You can enter the email sender’s domain name into the API’s search field to determine the message’s real source.
For demonstration purposes, let’s say the email address [email protected] was used. The IP Geolocation API should give you this result:
As you can see, the inbox isn’t registered in Canada. While this alone doesn’t justify distrusting the sender, you might want to dig further. You could, for example, check for website category before clicking URLs and potentially end up on unknown sites containing malware.
Case Study #4: Content Filtering
We briefly mentioned URL blacklisting as a means to prevent an e-commerce form jacking attempt. An even broader approach than URL blacklisting, though, is content filtering. Content filtering is the practice of restricting a user’s Internet access to predetermined types of content only. A typical office scenario would likely allow employees to visit news and educational websites, while they would be prohibited from accessing social media, shopping, gambling, and adult websites. The protocol may vary depending on the company and industry, but this is usually the standard scenario.
Companies can effectively implement content filtering with the use of the Website Categorization API. However, a more secure and cybersecurity-focused approach would also require companies to use additional tools that allow them to glean more information about a particular website.
Consider this hypothetical scenario where an office staff member tries to access the following websites. Note that this employee may knowingly type these sites’ URLs into his/her browser or be taken to the pages after clicking a link embedded in an email, chat message, or an ad. In other cases, the employee may land on these websites via redirection when he/she opens an email attachment or a transferred file.
- 4011sagesave[.]info: Categorized under Events and Attractions, Personal Celebrations & Life Events, Business and Finance, Entertainment Industry, and Travel.
- whoisxmlapi[.]com: Categorized under Technology & Computing, Information and Network Security, Email, Business and Finance, Information Services Industry, Technology Industry, Telecommunications Industry, Internet, and Computer Networking.
- sportsbettingdime[.]com: Categorized under Sports and Poker and Professional Gambling.
Standard content filtering protocols would most likely immediately block the employee’s access to sportsbettingdime[.]com since it is considered a gambling site. The other two websites, on the other hand, could be permitted, since they do not belong to the blacklisted categories. As such, the office staff can access the first two domains. But are these websites safe? Let’s find out by adding one more step—a domain reputation check.
In our hypothetical example, the first domain name, 4011sagesave[.]info, is actually listed as a confirmed malicious site on VirusTotal. A quick domain reputation check using Domain Reputation API would reveal that the domain is also listed on PhishTank.
Among the three domains, therefore, only whoisxmlapi[.]com should be allowed access to your network. 4011sagesave[.]info is a known malicious domain listed on at least two malware databases (VirusTotoal and PhishTank). As such, although it falls under whitelisted website categories, it is clearly not safe to visit. As an additional layer of protection to blacklisting website categories, it may also be a good idea to block access to websites that are listed on any threat feed aided by a domain reputation checker.
Some may argue that since the domain reputation check already reveals many things about a website, the categorization process could be skipped altogether. However, that is not the case. In our hypothetical scenario, for instance, the office staff member would be immediately be prevented from accessing sportsbettingdime[.]com since the domain falls under Poker and Professional Gambling. If you ignore the categorization step but rely on a domain reputation check, access to the said website could be granted due to the following reasons:
- Sportsbettingdime[.]com is not listed on any malware data feed and so it would be deemed safe to visit.
- Although Domain Reputation API says the domain is registered in an offshore country, along with other warnings, it is not listed on a malware feed.
Based solely on a domain reputation check, the employee could access the gambling site if website categorization isn’t done. Website categorization automatically weeds out sites that though are safe to access can hamper employee productivity and use up much-needed resources, which could negatively impact the company’s bottom line in the long term.
Bottom line: Check Website Reputation & Trust with the Right Intelligence Sources
Fortifying one’s network against e-commerce formjacking, phishing, ransomware, and other cyber attacks requires careful scrutiny of who website visitors are and where they originate. Organizations need to know which websites to trust and which ones should be avoided.
Security solutions are great at preventing malicious files from being executed on vulnerable systems, but not all can distinguish between safe from damaging traffic. Bolstering their capability to distinguish between malicious and non malicious site visits is possible with Website Categorization API and other domain and IP data feeds and APIs — providing for more proactive defense.
Indeed, by preventing malicious individuals from interacting with your network and dropping unwanted files into gaping holes, organizations can effectively stop attacks before they even take root.Read the other articles