Content policies can be organized around two dimensions: first, services provided on the network, including:
Second, based on threats, such as
There is clearly overlap, for example, between how spam is handled in an SMTP email system and a POP3 email system. At the same time, different protocols or services have different vulnerabilities and require different types of monitoring. For example, private or confidential information can be transmitted via email or FTP; however, FTP's long history of vulnerabilities warrants attention to those conditions.
Polices are rules applied to groups of users. Policies can be either all users, known as global policies, or non-global policies, which apply to groups of users. It is preferable to place as many content rules in global policies as possible. Non-global policies should be used for exceptions to the rules. For example, as a global policy, you might limit the size of attachments to 10MB but want to allow attorneys and others in the legal department, who tend to work with large volumes of documents, to accept attachments as large as 20MB.
Non-global policies are assigned to groups of users. In general, the groups should be logically related by their organizational function rather than common characteristics of the policies that are applied to them. For example, both the legal and marketing departments might be allowed to receive attachments as large as 20MB. However, that similar requirement is basically a coincidence; tomorrow the requirement of one of the departments may change. Thus, an administrator would want to establish separate legal and marketing groups rather than a single 20MB Attachment group.
Content policies need to address a range of topics, including scanning, encryption and digital signatures, disclaimers, and content size. Scanning is a broad set of activities geared toward ensuring unwanted content is not allowed into an organization and protected content is not allowed out. Scanning policies should define rules for
Antivirus scanning will use both signatures (binary patterns indicating a virus or other malware) and heuristic analysis (general patterns that indicate the existence of malware, such as an attempt to modify files with prior user command) to detect malware. Antivirus policies should define how an infected file is handled. It could be deleted, quarantined, or cleaned and passed through. Often the best option is to clean and pass through; the recipient gets the message but not the malware. One drawback to this approach is that the message is changed and so any associated digital signature will no longer match when a new signature is calculated upon arrival. Antivirus policies should include frequent updates of virus signature files as well as patches and upgrades to the virus detection engine.
Anti-spam scanning policies need to balance the need for comprehensive rules that capture most spam while not restricting access to legitimate email with false positives. To ensure the best possible spam protection, keep spam files up to date. Also, use the white list feature, Permit Sender, to list senders allowed to bypass spam scanning, which prevents the chance of false positives (legitimate email classified as spam) from trusted senders. Similarly, blacklists are used to prevent known spammers from sending messages. This setup prevents potential problems, as spammers routinely hijack email accounts so that legitimate emailers may have their messages blocked.
Policies should define how spam should be handled once it is identified: it can be refused, deleted, or forwarded to a special recipient who is monitoring spam activity on the network. Another option is to add a message to the email indicating the message is potential spam and let the recipient decide how to deal with the message.
Both incoming and outgoing messages should be scanned for banned words and phrases. Content rules should include checks to words or phrases in email messages that might be considered as contributing to an offensive or hostile work environment. Outgoing messages should be checked for private, proprietary, or other confidential information. This setup can require careful crafting of rules.
Consider a healthcare provider that uses email to transmit patient information between doctors. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) regulation places restrictions on how protected health information is shared among doctors, insurances companies, and others involved in healthcare. Sending an email with an attached patient record might violate HIPAA unless the recipient is allowed by the patient to receive that information. Of course, the email system or content filter will not have access to databases or paper files that identify legitimate recipients of protected healthcare information. In such cases, a provider may require that all patient records are sent from a single email account that is used only by patient records personnel. A set of content rules could then be defined to block the transmission of the message with indicators of protected patient information, and forward it to an email account for further review (see Figure 2.1).
Figure 2.1: Quarantine areas can be established to retain sensitive information so that it can be reviewed before sending it outside the organization.
Proprietary and trade secret information is protected in a similar way. Policies should include rules for blocking or quarantining confidential information before it leaves the email system. This setup will require the custom definition of a dictionary of terms and phrases, such as project and process names, that are kept confidential.
Another area that should be addressed in content policies is time-wasting Web sites. Although many organizations have no interest in blocking occasional visits to news sites, those same organizations have no need for gambling or adult sites. Policies should be defined that block access to known time-wasting, non-work–related sites. Content policies cover a broad range of topics but are essentially rules for filtering what you would want coming into or going out of your organization.
To monitor the effectiveness of a secure content appliance's settings, administrators must establish a baseline of activity and then regularly examine the volumes and types of events on the network.
A baseline should be established when the appliance is first installed. The objective is to understand what constitutes "normal" activity on the network, including the number of emails sent and received, volumes of HTTP and FTP traffic, the number of spam messages, and malware applications detected. The baseline should include both absolute measures on protocols, such as the volume of HTTP traffic per day, and percentages, such as the percent of emails classified as spam.
In addition to these measures, the administrator should assess the accuracy of the content filtering to determine the rate of false positives, the number of messages incorrectly categorized as spam, attachments incorrectly identified as malware, and the rate of false negatives (that is, banned content that was not detected by the appliance). This work requires manual review of quarantined messages and careful tracking of malware infections.
Once the baseline is established, administrators should monitor the same measures over time using the same reports and analysis used to establish the baseline.
The basic reporting tools at the administrator's disposal are:
Each type of reporting has its advantages and requires varying levels of configuration and integration.
Browser-Based Appliance Reports
The secure content appliance includes several reports providing summary and detailed information about traffic volumes and significant events. Although the details of each of these reports are more thoroughly described in the appliance documentation, let's take an overview look at example reports provided.
For example, a secure content appliance can provide information such as system status that includes details about protocols, hardware, load sharing, and general status information. The protocol status displays counts of the amount of traffic scanned, the number of viruses detected, emails deferred, spam messages blocked, and volumes of HTTP, FTP, and email traffic. You can also gather information about the status of each protocol and the workload processed. The hardware, load, and general status information display low-level details, such as the RAID status of hard drives and MAC addresses of the appliance's NICs (see Figure 2.2).
Figure 2.2: An example report provided by the secure content appliance.
Additional information provided by a secure content appliance includes Web pages that let administrators see the history of selected counters over a 24-hour period. For example, an administrator can track the number of email messages originating inside and outside the network (see Figure 2.3).
Figure 2.3: Examining a historical view of select counters.
In addition to storing performance detail on the appliance, administrators have the option of centralizing reporting with a third-party tool or basic network event monitoring tool such as SNMP traps and syslog. Events are filtered and sent to reporting tools based on three criteria:
A third-party tool, such as McAfee's ePolicy Orchestrator, centrally manages security policies and procedures across a network. These applications passively monitor activity on a network, prevent changes to system configurations, and ensure that workstations and servers remain in compliance with security policies. These tools provide consolidated reporting and library of predefined reports.
For example, when an ePolicy agent is installed on a secure content appliance, events are sent to the ePolicy Orchestrator and included in that application's reports. Reports specifically designed for the WebShield secure content appliance include:
Other reports provide statistics on throughput and more detailed information on viruses and spam prevention. In addition to the predefined reports, administrators have the option of defining custom reports using Crystal Reports, an industry-leading reporting application.
Many organizations may already have existing event reporting systems or practices in place based on email alerts, SNMP messages, and syslog. Third-party tools usually support email notification of an administrator when an event occurs.
SNMP is a protocol designed for sending messages from a managed device to a network management system. An agent resides on the managed device—in this case, the secure content appliance—and sends messages to a management device that logs the message. Syslog is an application that allows distributed systems to centralize their logging information.
Regardless of how administrators choose to log events and report on performance, the overall monitoring process is essentially the same.
There are several steps to maintaining an effective monitoring procedure.
The first step is to establish a monitoring policy that defines which measures will be tracked, how often measures will be taken, who is responsible for collecting measures, who is responsible for reviewing measures, who is responsible for acting on particular information, and the conditions that warrant the creation of a new baseline set of measures.
The baseline should be documented along with secure content policies and current appliance configurations. The baseline document should include:
The negative measures, such as missed viruses and mistaken URL blocks, will be relatively infrequent, so those should be measured over long time periods to get reasonably accurate measures.
Once the baseline is established, administrators should review reports, event logs, and other information collected to monitor variations from the baseline:
Checking the accuracy of content filtering is a time-consuming task and can rarely be performed as frequently as analyzing reports. As time permits, administrators should review message quarantines and reports of virus or other malware infections to determine the number of errors in the filtering process. Depending on the type of error, one of several actions may be warranted:
In addition to the regular four tasks, administrators may need to redefine their baseline under some circumstances. Significant changes in network infrastructure and number of users, the introduction of new enterprise applications, or major organizational changes, such as a merger, typically justify creating a new baseline measure for secure content monitoring.
The first step in tuning content filtering rules is to understand how they are organized and how they are applied. Content filtering is dependent on a wide range of rules that control what content can enter and leave a network. These rules are grouped into related sets of rules known as policies. Policies are generally defined for a particular function or protocol and apply to either inbound or outbound traffic. Policies are typically created for:
Policies include rules for different aspects of a protocol. An SMTP policy, for example, typically includes rules governing:
There are two types of policies: global and non-global. Global rules apply to everyone. A typical global rule performs a medium-level virus scan on all incoming SMTP traffic. Often, groups of users within an organization will have slightly different requirements and require a non-global policy.
For example, the data warehouse group may download a large data file every Friday night. Due to processing constraints, the file must be downloaded in 1 hour or less; using a medium virus scan can lead to a download time of more than 1 hour. An exception is made for this download and only a low-level virus scan is performed on this file to ensure the data warehouse process meets its service level agreements (SLAs).
Global policies are designed to cover a wide range of threats. Ideally, all users would be covered by a single set of global policies, but that is not always possible. Still, from a management perspective, it is best to minimize non-global policies and keep global and non-global policies closely coupled.
In general, try to keep rules as general as possible. This method allows rules to be applied to the maximum number of cases, which in turn can help minimize the number of rules required. A corollary to this guideline is to not use more conditions than required in a rule. This allows the rule to be applied to the broadest number of events possible. When exceptions to the rule are discovered, non-global policies can be defined to address the exceptions.
Expect exceptions to at least some policies. For example, as a general rule, trade secret information is not allowed to leave the organization. A research and development group may have executed a joint development agreement with a business partner and now need to share information about a narrow range of trade secret processes the company uses. In this case, a new, non-global policy is defined and applied to members of the research and development group working on this project. The policy will allow transmission of email messages that contain phrases associated with the proprietary process only when the email recipient is included in a list of registered researchers at the partner company. In the case of global policy, the rules were as general as possible, in the case of non-global policies, the rules should be as specific as possible.
Non-global policies inherit rules from global policies. This setup helps to ensure that the default behavior of global policies is carried over to non-global a policies. Administrators can change only the minimal number of rule attributes to implement the exception they need. For example, a POP3 policy may allow for email attachments as large as 5MB, perform a medium-level virus scan, check for banned words and phrases, and check for known spammers in the send address.
Now engineers in the product design department may have to exchange large computer-aided design (CAD) drawings. The administrator could define a non-global policy that inherits the global policy and then override the 5MB limit on attachments and replace it with a more appropriate limit, such as 30MB. The administrator does not need to change any other part of the inherited policy for it to also apply to engineers. In addition, if in the future, the global policy changes, those changes will be reflected in the non-global policy (except, of course, if the change applies to a rule that is overridden by the non-global policy).
Two principals of rule design are worth calling out for emphasis:
When there is a problem with a message or a document, how can it be addressed? There are several options:
The first option works well with known spam or messages that are sent containing banned language. The second option is used to allow messages through after the threat has been eliminated and there is no concern for harm to network resources. The remaining options all require the ability to isolate and store content before acting on them further. A secure content appliance should do so by providing quarantine areas and deferred mail storage as Figure 2.4 shows.
Figure 2.4: At any time, a systems administrator can find the number of quarantined and deferred messages stored in the appliance.
Quarantining content isolates a threat to a storage area on the appliance so that the threat cannot harm network resources. A message should be quarantined if one of two things occurs:
Once quarantined, administrators should review the messages and take appropriate action (see Figure 2.5).
Figure 2.5: Quarantined viruses are kept in a secure area on the appliance until they are acted upon by the administrator.
Administrators have several options for dealing with these isolated messages. The messages can be viewed, deleted, or forwarded.
Unless the message is being sent to an antivirus vendor as a sample for analysis, use forwarding with care. The message will still be infected.
Quarantining a virus-infected message keeps the malware from reaching the desktop. Although the desktop is likely protected by local antivirus software, it is better to keep the virus from reaching its destination. The desktop, for example, may not have the latest antivirus signature files or scanning engine. Some malware is now designed to disable desktop antivirus programs. Another technique changes the local host file so that automatic updates programs cannot reach vendor's sites to download updates. Quarantining at the appliance is another example of layered protection that compensates for vulnerabilities and threats to individual components of the security system.
Identifying spam is not an exact science. Some messages that may appear to be spam are legitimate emails and vice versa.
Some spam is easily identified and systems administrators can be confident that it is actually spam. Borderline cases are more problematic. Administrators do not want to raise the threshold too much on what is considered spam or else spam that should be blocked will make it to recipients' inboxes. At the same time, email administrators do not want to delete a legitimate email that is mistakenly categorized as spam. Quarantining provides a middle ground. Messages that are considered spam can be stored in the email quarantine area until they are reviewed by the administrator and dispatched accordingly.
Information security is often described as providing CIA: confidentiality, integrity, and availability. Deferred email management contributes to integrity and availability by providing the ability to defer the delivery of messages if there is a problem relaying a message. Ideally, a high-level dashboard display, as Figure 2.6 shows, will include the status of deferred messages.
Figure 2.6: A secure content appliance interface should provide summary information about the number of quarantined and deferred items.
Quarantining is also used with content filtering to ensure that controlled content—such as proprietary information, personal documents, and other confidential material—is not sent outside the organization inappropriately. Like spam, the suspect content may be stored on the appliance and held for review by the application administrator.
Quarantining and deferring are two common methods for creating middle grounds. In the case of quarantining, infected messages, borderline spam, and suspect content can be held and reviewed before letting it pass or deleting it completely. In the case of deferred email, messages can be stored and forwarded at a later time rather than discarding a message after initial attempts fail.
Spam filters depend upon rules that are designed to identify messages that are truly spam without mistakenly categorizing legitimate email as spam. These rules are created by examining large numbers of spam messages to identify characteristics common to spam. Typically, these rules take into account phrases commonly found in spam and their location within the structure of an email.
Spam messages will, of course, vary, but there are common characteristics that anti-spam designers can use to identify the most obvious spam:
Marketers have long known that short, well-phrased pitches can get a user's attention. Spammers use the same principal, and, fortunately for the rest of us, this is the Achilles heel of spam. Table 2.1 lists several phrases that are good indicators of spam along with scores, or weights, indicating the relative confidence that this phrase indicates a piece of spam.
Spam Indicator | Score |
Get paid for your opinion | 2.0 |
On sale | 1.0 |
Limited time | 0.8 |
Unbelievable prices | 0.8 |
From: Antivirus Administrator | 1.2 |
Dear Friend | 1.8 |
Congratulations you are a winner | 2.0 |
Table 2.1: Example spam phrases and scores of the likelihood that they are spam.
Scores are essential to measuring the likelihood of spam. All of the phrases listed in Table 2.1 can be used in legitimate emails. However, if enough of them are used, even if they only weakly indicate spam (for example, through the use of "limited time"), there is a good chance the message is actually spam. Similarly, if only two or three phrases are used but are strongly correlated with spam (for example, "Get paid for your opinion"), chances are good that the message is spam.
The filtering rules add the score associated with each matching spam phrase to find the total score for a message. If the score exceeds a threshold, the message is considered spam.
To illustrate how this works, consider two example emails. The first is a legitimate message in response to a sales call.
Dear Frank,
Thanks for taking the time to meet with me yesterday about our new line of office furniture on sale through the end of the month. I'm sure you'll agree that some of our specials are at unbelievable prices but we are only offering these to select customers and for a limited time.
I've attached a formal proposal for your review. Please feel free to contact me with any questions; otherwise I will call you Friday to follow up.
Regards,
Mary Jones
Acme Office Furniture
This message has three phrases commonly found in spam (indicated by bold italics). The total score is calculated as:
On Sale | 1.0 |
Unbelievable prices | 0.8 |
Limited time | 0.8 |
2.6 |
Assuming a threshold of 4 (a low tolerance for spam), this message is not considered spam and would pass through the filter. The following example is a fictional but representative spam:
Dear Friend,
Congratulations you are a winner! For a limited time, you can claim you prize from Grand Award Sweepstakes! Just click the link below, provide us with your name and address and the bank account number where you would like the funds deposited.
Again, congratulations,
Yours truly,
Grand Award Sweepstakes Prize Committee
This message also has three phrases commonly found in spam (indicated by bold italics). The total score for this message is:
Dear Friend | 1.8 |
Congratulations you are a winner | 2.0 |
Limited time | 0.8 |
4.6 |
As the total is greater than the 4.0 threshold, this message would be correctly categorized as spam.
This technique generally works well because it's fast (there is no complex analysis, just string matching and simple arithmetic), and with well-crafted rules, correctly categorizes most spam.
However, occasionally, things do not work as planned.
Spam filter rules are not perfect. They apply rules derived from examining large samples of spam and non-spam messages. Using statistical methods, rule designers can generalize rules from large samples to find the best indicators of spam. As with the use of statistical methods in other applications, there is a margin of error. These errors come in two forms: false positives and false negatives.
A false positive mistake categorizes a legitimate email as a piece of spam. This mistake occurs when phrases commonly found in spam are used in the email. If false positives are occurring at an unacceptable rate, the threshold for classifying a message as spam may be raised. Doing so will cause fewer messages to fall into the spam category and reduce the chances of a false positive because the message has some, but not many, characteristics in common with spam. For example, the first sample message would have been categorized as spam if a lower threshold, such as 2.5, had been set. Although lowering the threshold decreases the chance of false positives, it increases the chances of a false negative.
False negatives are mistakes that allow spam to pass through as legitimate email. In the ideal spammer world, spammers would be able to maximize their use of marketing phrases that catch readers attention while still "flying under the radar" of the anti-spam filters. As spammers learn the phrases that cause their messages to be filtered, they will vary the content of the message to avoid trigger matches with spam rules. If they can avoid triggering enough rules, their message scores will fall below the threshold and the spam will make its way to the recipient. Such messages are false negatives.
Clearly, there is not a definitive set of rules that will correctly identify all spam while avoiding false positives. Even if you could compile an ideal set of rules for all known spam, it would not necessarily work as well for new spam created by spammers with those very rules in mind. Filtering spam is a cat-and-mouse game. Spammers are constantly trying to avoid detection and will continuously vary their content.
Besides filtering rules, spam can be controlled through the use of white lists and black lists. A white list contains email addresses and domains trusted not to send spam. Business partners, clients, patients, government agencies, public companies, and other organizations that do business with a company may be added to the white list. Any messages that are sent from those addresses are not subject to filtering by spam rules.
The white list is useful for two reasons. First, as the messages are not scanned for spam phrases, the anti-spam application can operate more efficiently. This benefit is especially useful when a small number of domains send a large proportion of all email to a business or organization.
Black lists contain a list of addresses and domains of known spammers. Any message from an address on the black list is categorized as spam and not allowed through. Black lists complement filtering rules based on content phrases. Rather than crafting rules to cover all the spam that may come from known spammers, the black list effectively shuts down traffic from those addresses.
It is also important to stay up to date on spam filtering rules and the anti-spam engine. Anti-spam designers build new rule sets to address the changing patterns of spam as they emerge. Appliance administrators can use the Update | Anti-Spam option in the appliance to download and install the latest rules and anti-spam engine (the application that executes the rules).
There may be times when the up-to-date spam filters, black and white lists, and threshold adjustments are not effectively blocking spam. When that occurs, contact your vendor and submit samples of the spam that is slipping through. Anti-spam designers will then be able to study the spam and develop appropriate countermeasures.
Finally, when there is a sudden outbreak of a particular type of spam, vendors may develop specialized rules, known as extra rules in the technical documentation, to combat the outbreak.
To summarize, the following steps should be followed to improve the spam detection rate of a secure content appliance:
Following the first two steps will help to ensure the appliance is configured to filter the latest and broadest range of spam. Adjusting thresholds and configuring black and white lists can fine tune the appliance's performance. In special circumstances, submitting a sample of missed spam and downloading specialized rules may be the correct course of action.