FreeBSD Transparent Web Proxie with Squid


Over the past few days, I've been setting up a transparent proxie. I have a need to limit traffic to a small set of sites for a while, so I thought I'd look into setting up a transparent proxie with squid.

I run nothing but FreeBSD machines for my home infrastructure where possible. I easily found on the web a number of web sites that talked about using cisco routers to FreeBSD backends to create enterprise level web filters for corporations, or variations on this theme. These articles served to wet my appetite only, but didn't fulfill my needs. I don't have a cisco router, just a dinky soekris box running NanoBSD build from FreeBSD 6.1-RELEASE with a 6.1-STABLE kernel as of a few days ago. Many of the other articles discussed using pf and squid together, but these required that pf and squid run on the same machine. While the router could easily run pf, running squid was likely to prove impossible.

I was given a solution by a friend on IRC that I'll share here now so that future folks wishing to implement this in the future can find it via a web search.

I used ipfw redirection + squid to implement my transparent cache/access control mechanism. The router and the beefy squid box were on the same L2 network (same ethernet network, no routers) so I was able to use the forward function of ipfw to forward the web packets to the squid machine. The squid machine then redirected this traffic to the squid cache port, which then served up the pages that were requested.

Wimpy SOEKRIS box

On my Soekris box, I added the following lines to my kernel:
# Firewall stuff

Once I had the firewall stuff in my kernel, I had to write the firewall rules. Thinking that this would be easy after I rebooted, I learned again that ipfw defaults to deny, so I had to get out my serial console to correct this problem. I'd recommend getting the system completely debugged using a kernel with "options IPFIREWALL_DEFAULT_TO_ACCEPT" and then removing it after you are sure you don't need it.

After arranging for a serial console, I was able to write the firewall rules I needed. Since I have multiple firewalls protecting my network, the rules I used were relatively simple. The soekris router was acting as a router with a little bit if filtering as a backstop to the linux-based DSL modem that I have, which has some rudementary forwarding builtin, but it never hurts to have belts and suspenders. Here's the relevant rulesets that I used:
/sbin/ipfw add 1000 pass tcp from to any
/sbin/ipfw add 1100 fwd tcp from to any 80
/sbin/ipfw add 65000 pass all from any to any
The first rule is to allow my beefy squid host ( to get to the outside world for squid's network requests. It is very important that this rule be listed first so that the second rule doesn't cause an infinite loop.

The second rule redirects all web traffic (well, all traffic to port 80) to the beefy squid host.

The last rule is there to make sure that all other traffic is allowed. These rules will likely be part of a more compled rule set. The first few should be near the top, after all the sanity checks for obviously spoofed packets have been done. The last one, if you choose to have it, should be near the end of the list. If you've taken the time to implement a complete list of what's allowed, then change 'pass' to 'deny'.

Once you can see the packets on the beefy squid host with tcpdump, you are ready to configure that machine. But before we go onto that, here's the start of the dmesg to show that the Soekris box is really a small box. 64MB ram, with a 133MHz AMD Elan CPU:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #2: Mon Aug 21 00:32:38 MDT 2006
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Enhanced Am486DX4/Am5x86 Write-Back (486-class CPU)
Origin = "AuthenticAMD" Id = 0x494 Stepping = 4
real memory = 67108864 (64 MB)
avail memory = 60272640 (57 MB)

I also needed to setup /etc/rc.conf so that the firewall would be enabled:
The firewall script I had earlier I placed in the /etc/rc.firewall.router file.

Beefy Squid Machine

My beefy squid machine was a Dell box with a lot of memory and a fast 3.0GHz Intel EMT64 dual core:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 7.0-CURRENT #1: Mon Aug 21 19:36:37 MDT 2006
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) D CPU 3.00GHz (3000.12-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0xf47 Stepping = 7
Cores per package: 2
usable memory = 4282433536 (4084 MB)
avail memory = 4137197568 (3945 MB)
This machine ran squid and ipfw as well. ipfw needed to have the following kernel options to make it work:
Note: If this machine had been a 5.x or 6.x machine, you'd also need 'options IPFIREWALL_FORWARD_EXTENDED' for this to work. 4.x and 7.0 machines won't need this extra option.

Again, I needed to write firewall rules. For this machine, I needed to redirect all that web traffic to squid. Here's what I wrote:
# allow this machine to go to the net unmolested for port 80 traffic
/sbin/ipfw add 900 pass all from to any 80
# all other traffic goes to squid
/sbin/ipfw add 1000 log fwd,3128 tcp from to any 80
# everything else is cool
/sbin/ipfw add 65000 pass all from any to any

Since this was an internal machine, I didn't see the harm in passing all data. Your milage may vary.

I needed to configure squid. well, first I needed to install squid, but I just built it using the FreeBSD squid port found in /usr/ports/www/squid. I found some online resources here, and came up with:
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
auth_param basic children 5
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours
auth_param basic casesensitive off
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
acl all src
acl manager proto cache_object
acl localhost src
acl to_localhost dst
acl SSL_ports port 443 563
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 563 # https, snews
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
acl ten_net src
http_access allow ten_net
http_access deny all
http_reply_access allow all
icp_access allow all
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
coredump_dir /usr/local/squid/cache
The above config is mostly the stock, plus the 2.5 and earlier transparent cacheing recipe that can be found on many other web sites.

I also needed to add the following to /etc/rc.conf:

Blocking one website

Now that I have all of the above configured, I can proceed to blocking one web site that I started to in the first place. That turns out to be relatively easy. I just added the following to squid.conf:
acl dst_deny dstdomain .foo.com
deny_info ERR_NO_FOO dst_deny
http_access deny dst_deny
I also had to add a file /usr/local/etc/squid/errors/English/ERR_NO_FOO with a custom message and reminder about acceptable use. I copied the file ERR_TOO_BIG and hacked the text so it looked OK.

Also note: this block was put in place for a few days to serve as logical consequences for abusing the foo.com priviledge by a minor that has access to the network. If is unknown how well these sorts of blocks will work in the long run, and the author believes that they are of limited use for limited circumstances. Ideally, one would be able to trust completely everybody on the network, but that's not always possible. This might be a useful tool, but it is more to keep honest people honest. There's a number of proxies and such that can be employed to evade this sort of policing, etc.


As you can see, setting up this proxie isn't too bad. Once you have it in place, you can add additional bells and whistles as described in the squid FAQ and other documents. You can filter/cache more traffic. You can even run some of the add blocking software that is available as a squid add-on.


brd said...

Why not just use a caching DNS server and block it there? You might already have one of those.. and squid seems a bit heavyweight. Also depending on the user you might just be able to put something for www.dom.tld in /etc/hosts on the machine they use.

Warner Losh said...

Why not just use a caching DNS server?

That's certainly an option. It would likely be sufficient for the site I wish to block because it has redirects to one of several servers that it uses to serve up content. Entering an IP address would work for some sites, but not this one, I don't think. The individual being blocked likely isn't sophisticated enough to know about IP addresses, and how to enter them.

Since the person I wanted to limit is coming from a windows box that he has control over, I wanted to implement a solution that he couldn't undo on his box. Again, I don't think he's sophisticated enough to do that today, but there may come a day that he is.

There were a number of reasons I wanted to go with squid. I've wanted to see if caching web servers help my bandwidth utilization or not. I wanted the ability to have tailored error messages for visiting forbidden sites. I wanted a log of all activity to be generated on a machine that had enough disk to store it.
Finally, I wanted to know how to do something like this, as it is a cool thing to be able to do. There may be applications to this technology in products I work on at my day job.

Good comment/question.

Anonymous said...

Why did you use `ipfw' if you can do the same with `pf'?

IMHO pf is better with redirections (ipfw is limited with that).

With pf for example I'm using redirects through NAT gateways from the public world into internal networks (crossing interfaces) and changing dst-IP-addr and port (you can't do thinks like this that simple with ipfw).

Warner Losh said...

why not pf

pf.conf states "Redirections cannot reflect packets back through the interface they arrive on" which is exactly what I wanted to do. ipfw did this redirection flawlessly. pf is restricted to redirection out a different interface, or to the firewall. The Soekris box cannot handle squid. I had no machines outside the firewall I could redirect to. The documentation also implied that the destination IP address was rewritten, destroying information about where the connection was headed when it was intercepted, but to be honest I didn't verify this aspect of the problem.

I originally wanted to use pf to solve this problem, but couldn't discover a way to do so and have the topology I wanted. Early in my search for information, I discovered Daniel Hartmeier's Transparent Squid page, but that relied on pf and squid being colocated on the same box.

Anonymous said...

Well ok, I must truly say I haven't checked pf's manpage about that detail but from my memories I would like to swear I already did something like that using pf. I'm really damned quite sure but I might be wrong.

I used a technique like that to give internal users access to a box on the internal net which had been addressed by an outside IP address. So I rdr'd packets to this (outside) IP address on the internal network back again into the internal network (instead of letting them out).

I'm sure it worked but it's 1,5 years ago when I managed to do that.

When using a squid proxy your next project should be to have a virus filtering proxy... didn't you already think about it? :)
I've checked many but I don't like to have ph, p5, or whatever scripting language you imagine to have my squid relying on.

There are two solutions, but both are not yet ready for production: squidclam and havp (both are in your ports tree).

It should be a must if you're having windoze users running through your gateway.

Have fun with it! :)


brd said...

Ah, the quest for knowledge.

I'm probably a little biased against Squid because of the times I have been caught behind a proxy and it has caused me problems.

Have fun :)