Network Security – Internet Content Filtering Primer

Network Security – Internet Content Filtering Primer




Lots of companies use some sort of Internet firewall, but schools have a rare obligation to provide more extensive Internet content filtering on their student-use workstations. Content filtering can be applied in a variety of methodologies, and most content filtering technologies use a combination of multiple methodologies. Content filtering may be used to block access to pornography, games, shopping, advertising, email/chat, or file transfers, or to Websites that provide information about hatred/intolerance, weapons, drugs, gambling, etc.

The simplest method of providing content filtering is to specify a blacklist. A blacklist is nothing more than a list of domains, URLs, filenames, or extensions that the content filter is to block. If the domain Playboy.com was blacklisted, for example, access to that complete domain would be confined, including any subdomains or subfolders. In the case of a blacklisted URL, such as, en.wikipedia.org/wiki/Recreational_drug_use, other pages of the domain might be obtainable, but that specific page would be confined. Often wildcards can be employed to block great sets of domains and URLs with simple entries like *sex*. Blacklisting can also be used to prevent software installations by blocking access to files, such as */setup.exe, or to prevent changes to the computer by blocking potentially unhealthy file types, like *.dll or *.reg. Since content filters can’t however differentiate between art and porn, many content filters are also configured to block graphic file types, such as *.gif, *.jpg, *.png, etc.

A whitelist is the opposite of a blacklist; it’s a list of resources that the content filter should allow to pass; like a bouncer at the velvet rope, the content filter blocks any resources not stated on the whitelist. Blacklists and whitelists may be used in conjunction with each other to provide more granular filtering; the blacklist could be used to block all graphic file types, for example, but the whitelist could be configured to override the blacklist on images coming from stated, moderated or sponsored, age-appropriate image hosting sets. Blacklisting and whitelisting are quick and easy ways to determine whether or not a particular Website should be displayed. Checking a Website against a list isn’t processor-intensive, so it can be performed quickly, but it also isn’t strong in that new Websites are regularly popping up, and there’s no way anyone could ever stay on top of adding all of the bad ones to a blacklist.

So what do we do about that constant stream of new Websites coming online? That’s where more progressive filtering methodologies come into play. Parsing can be used to search for particular words or phrases in a Webpage. instead of rely solely on filtering by address, the content filter downloads the requested Website (unless closest confined by a blacklist) and reads every line of it, scanning for bad words or phrases. A list of bad words or phrases is stated, conceptually like a blacklist, but this list would be checked for any matching patterns in the Webpage, requiring more processor time, and slowing down the serving of Webpages. (In fact, I’m sure that at this very moment there are already a few content filters balking at displaying this very article simply because it includes the information sex in the past use, and if that doesn’t do it, check out what’s coming next…) A typical list of bad words and phrases might include “boobies,” but since Web authors are just as interested in getting their content past filters as administrators are in keeping it out, it may also be necessary to include strange-seeming varieties, such as b00bies, boob!es, or boobie$. Filtering may be set to block any pages that include any of the bad phrases, or phrases may be stated point values and the filter could be set to block any pages that go beyond a certain point threshold.

The next methodology of content filtering is called context filtering, and it picks up where information and phrase parsing leaves off. The problem with information and phrase parsing is that it’s not very smart. It simply acts upon everything that matches a predefined pattern, without attention to context. It might block pages that include the terms “the naked truth” or “chicken breasts,” while an administrator might not care about either “naked” or “breasts” in those settings, but might want to block pages including the words “naked breasts,” if used together. already assigning point values and thresholds, it’s possible for authentic Webpages to be confined.

For example, a Webpage about breast cancer could easily refer to breasts enough times to go beyond a point threshold. Context filtering is performed by a variety of proprietary algorithms that are designed by the various makers of Internet content filters. The trick is that they need to balance speed and accuracy; they must download and carefully analyze all of the wording of the requested Webpages to determine whether they are permissible or taboo, and they need to do it quickly enough to continue to appear as transparent as possible to the users. If they’re too quick to estimate, they may let by unacceptable content (known as “misses”) or block permissible content (known as “false hits”), but if they’re too pensive, users will complain about latency. Building a better algorithm requires more time and money, so frequently the faster and more accurate filters cost more.

Just for the sake of completeness in this treatise on Internet content filtering, I should also mention that there may be other methodologies employed or configurable in various Internet content filtering solutions. Virtually all Internet content filters function on port 80 (http); most ignore other protocols, but some may be able to apply filtering to other ports, or may be capable of thoroughly filtering out stated ports, such as FTP or Telnet. (I surprise which port “World of Warcraft” uses…)

Similar to firewalls, I should also point out that Internet content filters come as hardware or software solutions. Hardware solutions are commonly known as “appliances,” and software solutions are commonly known as “applications,” or “sets.” Hardware solutions provide for centralized administration. They may cost more, but they perform all of the filter-related processing so as to relieve your servers and workstations from any such responsibilities. They frequently come with subscription sets for updates to the blacklist, whitelist, phrase list, and context data, much like antivirus subscriptions provide updates to lists of virus signatures. They may be multi-homed pass-by gateways, or they may work by redirecting traffic to a stated port or destination IP address.

Higher-end models may also include caching to speed up the serving of frequently-accessed resources. Software-based solutions may be server-based or may be installed on each individual workstation. Most server installations offer the same centralized administration as hardware solutions, but of course, they use your processor and RAM to perform the filtering, instead of being a dedicated appliance. consequently, they may be less expensive. In the case of a workstation installation, besides installing the software on each individual workstation, you may also need to individually configure each workstation, and regularly you may need to individually update each workstation.

already Microsoft Internet Explorer has a free, simple, built-in Internet content filter – it’s called the “Content Advisor,” and you can configure it under Internet Options in the Windows Control Panel. It’s fine for your kid’s standalone computer or a small peer-to-peer network, but is probably inadequate as an enterprise solution. Whether hardware- or software-based, best-in-class enterprise solutions are often Active Directory-integrated, simplifying administration and configuration, and permitting filtering settings to follow users anywhere in the network. Teachers, for example could have less-restrictive settings, in spite of where they log in, while students could nevertheless be confined, already if they sneak into the faculty lounge during recess.




leave your comment

Top