Open Source Content Filter

When you’ve got lots of young internet users, a filter is the best way to allow access while keeping alot of the questionable content out. Such systems are expensive and difficult to setup and administer.

dansguardian aims to change that. This open source content filter and web proxy is quite effective at filtering questionable content and even ads. It can even be setup to use external anti-virus programs to scan content as its being accessed.

How does a content filter work?

The filter gets in the middle of the conversation between you and the web server.

Web proxy/filter diagram

Web proxy/filter diagram

Your browser asks the proxy/filter server for a website. The proxy server scans the request and the response for questionable content and viruses. If everything is clean, the content is returned to your browser from the proxy. If there is a problem with the content, then it is blocked.

Zero to filter in 10 minutes flat:

Assumptions: You have access to an ubuntu server and said server has access to the internet.

  1. Open a command prompt and type:
    sudo apt-get install tinyproxy dansguardian
    This will install tinyproxy, a web proxy server and dansguardian – a content filtering system.
    Ubuntu will also recommend ‘ClamAv’. Accept the defaults and install.
  2. Configure dansguardian.
    Edit the /etc/dansguardian/dansguardian.conf file
  3. Place a pound sign in front of the line with the word ‘UNCONFIGURED’
  4. Remove the pound sign in front of the line that starts with:
    contentscanner = ‘/etc/dansguardian/contentscanners/clamav.conf’
    This will enable clam av scanning of content.
  5. Next edit the conf file for tiny proxy located here:
    /etc/tinyproxy/tinyproxy.conf
  6. Around line 15, You should see a line ‘Port=8888‘. Change that to ‘Port=3128
  7. Start it up. You’ll need to start the proxy first, then the filter.
    sudo /etc/init.d/tinyproxy start
    sudo /etc/init.d/dansguardian start
  8. Configure you client computers to use the proxy.
    In firefox for example, go to Tools->Options->Advanced->Network-Tab
    Click on the ‘settings’ button.
    Click on the ‘Manual proxy settings’
    in the HTTP proxy settings, enter the address of your proxy server. In the port box, enter 8080.
  9. In your internet router, block access to the internet from all addresses except the proxy server.

Done!

Gotchas:

  • If the firewall on the proxy server is off or allowing direct connections to the proxy server, your filter can be bypassed by connecting to port 3128. Make sure only localhost can connect to this port.
  • Anyone with SSH access can subvert your proxy. Using port-forwarding and connecting directly to the proxy on port 3128, your proxy can be bypassed.
  • If the firewall on the proxy server is not allowing connections to port 8080, then no one will be able to use your new content filter.
  • Dans guardian has a perl gui, but mod perl is disabled on my server. I wrote a quick php script to replace it. You’ll need to modify your dansguardian.conf file to enable it.
  • Webmin provides  a gui for this system. If your not comfortable editing text files on a linux system, webmin is the way to go. It provides a web gui to make changes to a linux system.
  • While it is possible to install this on an ubuntu desktop, its best to do this to a computer/server with limited physical access. This makes bypassing the filter much more difficult.

Quercus and App Engine – Reading from the data store.

Using brian’s scripts a base and google’s jdo documentation, I’ve got a working example of reading from the app engine data store with php.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?php
import phptest.test;
import phptest.PMF;
 
$p = PMF::get();
$pm = $p->getPersistenceManager();
$q = $pm->newQuery("select from phptest.test");
$results = $q->execute();
//Returns a php array of resources. Foreach doesn't seem to work here. This seems to be an issue with Quercus.
$account = count($results);
for($a=0;$a < $account; $a++){
	echo $results[$a]->getId() . " - - " . $results[$a]->getFirstName() . '<br/>';
}
$pm->close();
?>

Its just that easy.

Deletes work the same way too. The JDO persistence manager has a method called ‘deletePersistentAll()’ which will truncate the table. To delete a specific item, select it and call ‘deletePersistent(obj)’ where the obj is your record object. See the comments below.

JDO updates your data automatically. Select the item. Change the data and when you call the persistence managers close method, the changes will be saved.

Need more info: Read the google docs.

PHP in Google App Engine

Google recently expanded their app engine service to allow java in addition to python. No php or ruby support yet. Choosing java opens the door for both.

Quercus is a pure java implementation of php and it allows us to run php code in app engine. Some really smart people figured out how to make it work. Check out brian and webdigi.

Using their pre-configured environments its actually pretty easy. Brian even figured out how to start saving stuff into the app engine data store. Good stuff.

If your interesting in playing with this, here are some links you’ll need in addition to the ones provided by Brian and friends:

Google’s docs for app engine data connector -Explains how jdo works. I’m a php guy, not a java guy. Using Brian’s example on how to write to and the docs found here, you should be able to get started reading and writing to your accounts datastore.

App engine SDK – Get the eclipse plug-in. Its got a local development server included so you can test your apps before sending them up to google. It also has a push button upload. The module can also co-exist with Zend’s PDT plug-in.
NOTE: When starting the development server, you may get an error from java the first couple of times the page loads. Just reload the page a couple of times and the error will go away.