FreeBSD Transparent Web Proxie with Squid


Over the past few days, I've been setting up a transparent proxie. I have a need to limit traffic to a small set of sites for a while, so I thought I'd look into setting up a transparent proxie with squid.

I run nothing but FreeBSD machines for my home infrastructure where possible. I easily found on the web a number of web sites that talked about using cisco routers to FreeBSD backends to create enterprise level web filters for corporations, or variations on this theme. These articles served to wet my appetite only, but didn't fulfill my needs. I don't have a cisco router, just a dinky soekris box running NanoBSD build from FreeBSD 6.1-RELEASE with a 6.1-STABLE kernel as of a few days ago. Many of the other articles discussed using pf and squid together, but these required that pf and squid run on the same machine. While the router could easily run pf, running squid was likely to prove impossible.

I was given a solution by a friend on IRC that I'll share here now so that future folks wishing to implement this in the future can find it via a web search.

I used ipfw redirection + squid to implement my transparent cache/access control mechanism. The router and the beefy squid box were on the same L2 network (same ethernet network, no routers) so I was able to use the forward function of ipfw to forward the web packets to the squid machine. The squid machine then redirected this traffic to the squid cache port, which then served up the pages that were requested.

Wimpy SOEKRIS box

On my Soekris box, I added the following lines to my kernel:
# Firewall stuff

Once I had the firewall stuff in my kernel, I had to write the firewall rules. Thinking that this would be easy after I rebooted, I learned again that ipfw defaults to deny, so I had to get out my serial console to correct this problem. I'd recommend getting the system completely debugged using a kernel with "options IPFIREWALL_DEFAULT_TO_ACCEPT" and then removing it after you are sure you don't need it.

After arranging for a serial console, I was able to write the firewall rules I needed. Since I have multiple firewalls protecting my network, the rules I used were relatively simple. The soekris router was acting as a router with a little bit if filtering as a backstop to the linux-based DSL modem that I have, which has some rudementary forwarding builtin, but it never hurts to have belts and suspenders. Here's the relevant rulesets that I used:
/sbin/ipfw add 1000 pass tcp from to any
/sbin/ipfw add 1100 fwd tcp from to any 80
/sbin/ipfw add 65000 pass all from any to any
The first rule is to allow my beefy squid host ( to get to the outside world for squid's network requests. It is very important that this rule be listed first so that the second rule doesn't cause an infinite loop.

The second rule redirects all web traffic (well, all traffic to port 80) to the beefy squid host.

The last rule is there to make sure that all other traffic is allowed. These rules will likely be part of a more compled rule set. The first few should be near the top, after all the sanity checks for obviously spoofed packets have been done. The last one, if you choose to have it, should be near the end of the list. If you've taken the time to implement a complete list of what's allowed, then change 'pass' to 'deny'.

Once you can see the packets on the beefy squid host with tcpdump, you are ready to configure that machine. But before we go onto that, here's the start of the dmesg to show that the Soekris box is really a small box. 64MB ram, with a 133MHz AMD Elan CPU:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #2: Mon Aug 21 00:32:38 MDT 2006
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Enhanced Am486DX4/Am5x86 Write-Back (486-class CPU)
Origin = "AuthenticAMD" Id = 0x494 Stepping = 4
real memory = 67108864 (64 MB)
avail memory = 60272640 (57 MB)

I also needed to setup /etc/rc.conf so that the firewall would be enabled:
The firewall script I had earlier I placed in the /etc/rc.firewall.router file.

Beefy Squid Machine

My beefy squid machine was a Dell box with a lot of memory and a fast 3.0GHz Intel EMT64 dual core:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 7.0-CURRENT #1: Mon Aug 21 19:36:37 MDT 2006
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) D CPU 3.00GHz (3000.12-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0xf47 Stepping = 7
Cores per package: 2
usable memory = 4282433536 (4084 MB)
avail memory = 4137197568 (3945 MB)
This machine ran squid and ipfw as well. ipfw needed to have the following kernel options to make it work:
Note: If this machine had been a 5.x or 6.x machine, you'd also need 'options IPFIREWALL_FORWARD_EXTENDED' for this to work. 4.x and 7.0 machines won't need this extra option.

Again, I needed to write firewall rules. For this machine, I needed to redirect all that web traffic to squid. Here's what I wrote:
# allow this machine to go to the net unmolested for port 80 traffic
/sbin/ipfw add 900 pass all from to any 80
# all other traffic goes to squid
/sbin/ipfw add 1000 log fwd,3128 tcp from to any 80
# everything else is cool
/sbin/ipfw add 65000 pass all from any to any

Since this was an internal machine, I didn't see the harm in passing all data. Your milage may vary.

I needed to configure squid. well, first I needed to install squid, but I just built it using the FreeBSD squid port found in /usr/ports/www/squid. I found some online resources here, and came up with:
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
auth_param basic children 5
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours
auth_param basic casesensitive off
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
acl all src
acl manager proto cache_object
acl localhost src
acl to_localhost dst
acl SSL_ports port 443 563
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 563 # https, snews
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
acl ten_net src
http_access allow ten_net
http_access deny all
http_reply_access allow all
icp_access allow all
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
coredump_dir /usr/local/squid/cache
The above config is mostly the stock, plus the 2.5 and earlier transparent cacheing recipe that can be found on many other web sites.

I also needed to add the following to /etc/rc.conf:

Blocking one website

Now that I have all of the above configured, I can proceed to blocking one web site that I started to in the first place. That turns out to be relatively easy. I just added the following to squid.conf:
acl dst_deny dstdomain .foo.com
deny_info ERR_NO_FOO dst_deny
http_access deny dst_deny
I also had to add a file /usr/local/etc/squid/errors/English/ERR_NO_FOO with a custom message and reminder about acceptable use. I copied the file ERR_TOO_BIG and hacked the text so it looked OK.

Also note: this block was put in place for a few days to serve as logical consequences for abusing the foo.com priviledge by a minor that has access to the network. If is unknown how well these sorts of blocks will work in the long run, and the author believes that they are of limited use for limited circumstances. Ideally, one would be able to trust completely everybody on the network, but that's not always possible. This might be a useful tool, but it is more to keep honest people honest. There's a number of proxies and such that can be employed to evade this sort of policing, etc.


As you can see, setting up this proxie isn't too bad. Once you have it in place, you can add additional bells and whistles as described in the squid FAQ and other documents. You can filter/cache more traffic. You can even run some of the add blocking software that is available as a squid add-on.


Thecus N4100

spent much of yesterday trying to break into the Thecus N4100 that we have at work. I thought it would be a relatively easy thing to try. After all, OpenBSD supports the N2100 and how different can this one be? thought it might be nice to add FreeBSD support to the thing.

Turns out, quite a lot.

I was greeted with gibberish over the serial console port. 115200 looks like the correct baud rate, at least based on looking at the characters with a scope. The bit widths are right, etc. Yet all I get is garbage.

Then I went and googled for others that have done this. While $600 for the empty unit is a bunch of money, it isn't that much in the grand scheme of things. I was surprised to find that no one has a web page about this that google can get to. Maybe I need to learn to search better? Or maybe there's limited searching options for this community.

I've tried jumpers on all the things that looked like jumpers. I've tried every combination possible, but still no joy on the serial port.

The unit works, and is running Thecus' software. The GPL files showed no signs of what the key might be to the serial communications. The settings were 115200, 8, N. Maybe those settings are dependent on a clock that's off by more than a few percent, which is why the serial ports I've tried can't lock to it. I'll have to give the embedded ATMEL part that I'm using a spin to see if I can vary the baud rate enough to see it (the embedded part I use has a base xtal that all frequencies are based on, so I've seen my share of non-standard baud rates debugging it as I went from board to board that had different frequencies).

I'm also putting this out as a kind of test to see if I can find people who have done this sort of thing, or would like to. I'm not sure of how well Google covers blogs, so now I have something to search on.

In the future, I'll try to see if I can kick this CPU into a 'useful' recovery mode so I can just take over from scratch before redboot even gets around to starting :-0

How to recover a hard disk

OK. I'd like to spend some time and find out how hard it is to do DIY data recovery from a hard disk drive. I had one crash and burn while I was at BSDCan this year. Now that the move is done, I'll have time again to take it apart, and put it back together 'fixed'. The drive motor is shot, so I guess I'll start by going onto EBay and buying one.

Wish me luck!