You are viewing cloquewerk

new blog

Sep. 26th, 2012 | 12:55 pm

I've gotten a little tired of LJ for a few reasons, so I have started a new blog at http://mrcote.info/. It's running Octopress and feels a lot slicker. I'll keep my LJ account around for reading other blogs.

Link | | Add to Memories | Share

Mozilla A-Team: Peptest results, an exercise in statistical analysis

Mar. 4th, 2012 | 02:45 pm

UPDATE: It's been pointed out that the current metric (sum of squares of unresponsive periods, divided by 1000) is used in Talos and has had a fair bit of thought put into it. I was curious what not squaring the results would do, but I wouldn't go with another metric without more careful thought.

UPDATE 2: It has also been pointed out that peptest tests performance, not correctness, and hence should report its results elsewhere (essentially as I've done with the sampled data) and not be a strict pass/fail test. This approach definitely warrants some consideration.




About a week and a half ago, peptest was deployed to try. To recap, peptest identifies periods of unresponsiveness, where "unresponsiveness" is currently defined as any time the event loop takes more than 50 ms to complete. We have a very small suite of basic tests at the moment, looking for unresponsiveness while opening a blank tab, opening a new window, opening the bookmarks menu, opening and using context menus, and resizing a window.

The results are currently ignored, since we still don't know how useful they will be, but you can see them by going to https://tbpl.mozilla.org/?tree=Try&noignore=1. They are marked by a "U" (not sure why exactly, but it will change at some point to something more obvious).

At the moment, every platform fails at least one of these tests, and most of the time there are multiple failed tests. This isn't too surprising, since 50 ms is a pretty bold target. However, going forward, we need some sort of baseline result, so that we can identify real regressions. To accomplish this, peptest tests can be configured with a failure threshold. We calculate a metric for each test (see below), and, if a failure threshold is configured, a metric value below this threshold is considered a pass. Hopefully, we can identify a threshold for each test (or, likely, a threshold for each platform-test combination) such that all the tests pass but significant increases in unresponsiveness will trigger failures. At the same time, we will also file bugs on all the tests so we don't forget about the fact that there are still unresponsive periods during their execution that are being hidden by the thresholds. We can lower or eliminate the thresholds if these bugs are partially or fully fixed.

Things, of course, aren't that simple. I gathered and analyzed the peptest logs from try over a four-day period, and there is quite a lot of variance in the results, even on the same platform. With a sufficiently generous threshold, we could get the tests to pass most of the time, but there are occasionally some crazy outliers that no reasonable threshold could contain. However, it is probably okay to have the tests turn orange once in a while. 0 oranges might be an unreasonable target for this project, and intermittent oranges would be a reminder that, sometimes, there are really unacceptable periods of unresponsiveness.

(Btw one test, test_contextMenu.js, appears to only fail on Linux and Linux64, but this is actually a bug in the test--on all the other platforms, it's erroring out before it hits the end. I've since fixed this but haven't collected new data yet.)

I experimented a bit with the test metric, to see if that improved the situation. Right now, as deployed on try, the metric is calculated as the sum of the squares of the unresponsive periods in a single test (an unresponsive period being, by definition, a value above 50). I tried just summing the periods without squaring them, which seemingly increases the variance in some tests and decreases it in others. I also experimented with raising the minimum unresponsive period from 50 ms to 100 ms, since there are strong arguments that 50 ms is pretty unrealistic, at least at this stage.

I've graphed the failures, along with their mean and standard deviations, at http://people.mozilla.com/~mcote/peptest/results/. I also plotted passes as 0s (there are certainly lots of unresponsive periods less than 50 ms in those passes, but for all intents and purposes they are 0) in a different colour. There are unique URLs to all combinations of platform, test, and metric. The raw data is also available there (in JSON).

Following is a brief discussion of some of the problems with identifying good failure thresholds.

Some of the simple tests don't have much variance. test_openBlankTab.js, which just measures the responsiveness when opening and closing a blank tab, mostly passes, with just a few outliers. Some slightly more complicated tests, however, have quite a bit of variance. The bookmarks-menu test, test_openBookmarksMenu.js, scrolls through the bookmarks menu and then opens the bookmarks window. The results on snowleopard are particularly egregious:



As you can see, most of the failures are clustered around the mean. The standard deviation encompasses most of them. Changing the metric from the sum of squares of unresponsive periods to just the sum of the periods improves things a little:



There is only one point above a single standard deviation, although two are rather close. Increasing the allowable unresponsive period to 100 ms reduced the standard deviation, but only because a few low points became passes:



So this is one example where we would expect to see at least one orange every few days, even if we set the metric to about 25% higher than the mean.

In other cases we have mostly passes but some really crazy outliers. On snowleopard, test_openWindow.js, which merely opens a new window, has mostly passes, but in this sample there is one run that had unresponsive periods totalling more than 250 ms.



So here, we could leave the failure threshold at 0 ms, although we'd still have oranges every few days. In this case, setting the unresponsive threshold to 100 ms wouldn't make a difference, since the few failures are significantly above 100 ms.

test_openWindow.js on leopard, however, is all over the place when using just the sum of unresponsive periods:



There aren't really any outliers here, just a large spread of values. A reasonable failure threshold here would have to be twice the mean to ensure that oranges only occur occasionally.

In this case, switching to a sum of squares makes the outliers more obvious, although the standard deviation becomes quite large:



And in case it wasn't obvious, the results are completely different on a different OS. Take test_openWindow.js on Windows 7:



Most results are clustered, but there are 5-6 real outliers, depending on how you define an outlier. This test-platform combination looks to be a real potential for regular oranges unless an extremely generous failure threshold is defined.

In conclusion, it's going to be kind of tough to define failure thresholds such that most runs pass and that real regressions are identified. There doesn't seem to be a huge difference between using the sum of unresponsive periods versus the sum of their squares, although in some instances the latter makes the outliers more obvious. Raising the minimum acceptable unresponsive period unsurprisingly causes more passes but doesn't really improve the variance in the failures. Regardless, it looks like I will have to go through the sampled results and, for each test, set a failure threshold that encompasses the majority of the failures, but even still there will be intermittent oranges. Comments and suggestions welcome!

Link | | Add to Memories | Share

flot-axislabels v2.0

Feb. 25th, 2012 | 11:17 pm

I've released version 2.0 of flot-axislabels, the flot plug-in for labelling axes. Flot is a great, easy-to-use JavaScript graphing lib, based on canvas; however, many people (myself included) viewed the lack of support for axis labels to be a big fault. With flot-axislabels, you can get said labels by just loading the script after flot and setting one extra option per axis (or a couple more if you have specific needs).

Version 2.0 (which is actually the first "real" release but has a lot of recent changes) now supports any number of X and Y axes. Previously only 2 X and 2 Y axes were supported (top, bottom, left and right).

Having more than 4 axes on a single plot probably sounds a bit weird, but apparently it is useful when plotting weather conditions:



You can see the live example and view its source to see how it's done. It's really quite simple.

flot-axislabels continues to support CSS translations, canvas, and traditional CSS positioning (plus a special mode for IE 8 combining CSS positioning with IE's special rotation functions). In the first two modes, labels for Y axes are rotated to face the plot. Graceful degradation is attempted based on the browser's detected capabilities.

Internally, it no longer pays attention to the name of the axis (yaxis, y2axis, etc.) but rather looks at the 'position' variable, which flot automatically sets if it is not provided. I believe this means that it will only work with flot 0.7, however.

Read the README, download the zip, and follow the project on github.

Link | | Add to Memories | Share

Mozilla A-Team: Writing tests for Peptest

Nov. 30th, 2011 | 11:50 am

With ahal's impending return to studentdom and myself coming back from paternity leave, I will be taking over Peptest development and maintenance. To get myself up to speed, I wrote some tests.




The easiest test to write is one that looks for unresponsiveness while simply loading a page. I've noticed that the site for my favourite blog, The Daily What, causes some pain in Firefox, what with all the videos and images and so forth. I wrote a very simple test to see if I was onto something:

Components.utils.import('resource://mozmill/driver/mozmill.js');
let c = getBrowserController();

pep.performAction('open_page', function() {
  c.open('http://thedailywh.at');
  c.waitForPageLoad();
})


Indeed, while there were no very long pauses, there were a string of short ones. Remember, we care about pauses longer than 50 ms, which Peptest identifies for us:

PEP TEST-START | test_dailyWhat.js
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 103 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 199 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 112 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 204 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 105 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 57 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 79 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 194 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 202 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 68 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 182 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 63 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 84 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 118 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 51 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 55 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 67 ms
PEP WARNING    | test_dailyWhat.js | open_page | unresponsive time: 215 ms
PEP TEST-UNEXPECTED-FAIL | test_dailyWhat.js | fail threshold: 0.0 | metric: 322.362
PEP TEST-END   | test_dailyWhat.js | finished in: 9426 ms


Not awful, but not great. I filed bug 706250 to investigate this.

Next, I decided to delve into some Project Snappy bugs to see what else I could find.

Changing the URL in the above test was all I needed to confirm that the page in comment 62 of bug 61684 was still an issue. This time I got about 50 unresponsive periods, the longest being 2.6 s. Ouch.

Bug 430106 is a little more interesting. Someone reported problems switching back to a tab in which a large image was loaded. The simplest way I could replicate this was by loading the example URL in one tab, loading any old page in a second tab, waiting for about 20 seconds, then switching back to the first tab. In peptest form,

Components.utils.import('resource://mozmill/driver/mozmill.js');
let c = getBrowserController();

while (c.window.gBrowser.tabs.length < 2) {
  c.window.gBrowser.addTab();
}

// Load large image in first tab.
c.tabs.selectTabIndex(0);
c.open('http://flickr.com/photos/thomasstache/2429920499/sizes/o/');
c.waitForPageLoad();

// Load any page in second tab.
c.tabs.selectTabIndex(1);
c.open('http://www.mozilla.org');
c.waitForPageLoad();

// Wait for memory to be freed from first tab.
c.sleep(20000);

pep.performAction('switch_tab', function() {
  c.tabs.selectTabIndex(0);
  // Wait for image to repaint.
  c.sleep(2000);
});


When I ran this, I saw a visible delay before the image was repainted. Peptest confirmed this:

PEP TEST-START | test_largeImgTabSwitchLocal.js
PEP WARNING    | test_largeImgTabSwitchLocal.js | switch_tab | unresponsive time: 54 ms
PEP WARNING    | test_largeImgTabSwitchLocal.js | switch_tab | unresponsive time: 835 ms
PEP TEST-UNEXPECTED-FAIL | test_largeImgTabSwitchLocal.js | fail threshold: 0.0 | metric: 700.141
PEP TEST-END   | test_largeImgTabSwitchLocal.js | finished in: 24083 ms


The unresponsiveness appears to be relative to the size of the image, as an image of about twice the dimensions, that is, 3.4 times as many pixels, resulted in a delay about 3.4 times as long.




One of the next steps for Peptest is to add JS-function tracing so we can figure out the exact sources of unresponsiveness. This, however, requires a fix for bug 580055, which in turn depends on bug 702740. As soon as patches for those bugs have landed, we'll add support to Peptest.

For more information on Peptest, see the wiki article and/or check out the code, which has recently been moved to hg.mozilla.org under mozilla-central/testing/peptest.
Tags:

Link | | Add to Memories | Share

Automated Speed Tests, take two!

Sep. 29th, 2011 | 05:31 pm

I recently implemented some improvements in the A-Team's Automated Speed Tests as per some requests I got back when I first announced them in July. Not everything's done, but I think this is a good point to advertise what's been changed thus far.

Firstly, I ditched the awful BIRT reports in favour of a custom web app that is faster, easier to use, and more flexible. You can restrict the date range (default is the last four weeks) and switch between tests and machines. The graph is also more responsive when turning on and off particular browsers (just click on the name in the legend). All the same data is there, but it's less cluttered and, well, less ugly!

By the way, BIRT appears to have a security hole in that it will insert the value of some GET parameters directly into the page without sanitizing them! So beware of that if you want to use BIRT for some reason.

Secondly, more tests! The first is MazeSolver, one that Firefox is particularly bad at. The second is test262, a JavaScript conformance test that has unfortunately made the name "Speed Tests" a bit of a lie.

A couple interesting observations I've made recently:

  • Nightly recently got better at Santa's Workshop. I ran the test myself to see, and Nightly maintains a higher number of elves for longer, but eventually it goes back down to one. So still a ways to go, but the median FPS is higher, at least. Nightly still also doesn't display all the colours properly.

  • Seems that Nightly 10.0a1 has one less pass in test262 than 9.0a1.


If you're wondering, the two Windows machines are running different hardware; Win7 1 is a 32-bit machine, and Win7 2 is a 64-bit machine, although I only switched it to use the 64-bit nightlies today. Email me if you want more particulars on the hardware.

Still more to come, including

  • more tests!

  • more browser strains!

  • more platforms!


And, as always, please let me know if there's more I can do to make the framework, tests, or data more useful.
Tags:

Link | | Add to Memories | Share

Automated Speed Tests!

Jul. 12th, 2011 | 10:22 am

It's hard to find a discussion of the speed of modern browsers that doesn't mention Microsoft's Test Drive speed demos. It's a common occurrence to find hundreds of fish swimming around a graphics developer's monitor. Continuing our mission to make developers' lives easier, the Mozilla A-Team has put together a framework to automatically run a few of these tests and put the results online. They're a bit ugly and slow, but some day I'll get around to cleaning them up.

We have set up a small framework that executes 5 speed tests twice daily against all the major browsers: IE, Safari, Chrome, Opera, and Firefox. Since we're particularly concerned with the latter, we run both the latest released version of Firefox and the latest Nightly.

For most tests, we sample the FPS every 5 seconds, since there is often a ramping-up time as objects are created and such. We then plot the median FPS for the test for each browser to make comparison easy. The results of any particular test run are also available through links in the graph and table for those curious about how the browser performs at various points during the test run.

One test, Psychedelic Browsing, uses a different metric, namely, the RPMs of a spinning patterned wheel. This is sampled only once, at the end of the test.

Disclaimer: I won't get into technical issues here, but sufficed to say that automating one browser is a little tricky; automating 5 browsers from different vendors is very tricky. One way we've reduced the number of variables is by limiting network access to prevent automatic, potentially performance-affecting browser and OS upgrades. This requires some periodic manual maintenance to update everything. We also reboot the machine after every test run. But mistakes happen and bugs crop up, so there are gaps in some of the graphs where one or more browsers were unable to start up or load the test suite, and some swings in results where the browser or machine was perturbed by some force (unfortunately this happened recently, which is why the last Firefox 4 results are all over the place after running stably for weeks). But the main method we employ to deal with all this is by running the suite twice every day, even though all browsers (except Nightly) change much less frequently. So, as in any scientific endeavour, ignore the outlying points and focus on the trends.

Here are a few things I've noticed, some obvious, some less so.

- Different browsers excel at different tests, though, not surprisingly, IE does well on all of them. Firefox is good or excellent at 4 of the 5 tests, but it's much worse than Chrome and IE at Santa's Workshop (see roc's post about this).

- Some browsers max out (60 FPS) on some tests. These tests would have to be modified for a true comparison. However some tests report FPSs above 60, which means they must be using some sort of "virtual" frame rate, since no monitor can display that much. More investigation needs to be done to see if this is a valid statistic for comparison.

- Nightly generally outperforms Firefox 4 except where they have maxed out. This is especially noticeable in SpeedReading, where 4 was only at about 32-33 FPS, but Nightly and Firefox 5 have are at 60 FPS.

- Some browser/test combinations are quite stable, with almost all results being the same, and some vary up and down. For instance, most browsers have stable results for Mr Potato Gun, but IE varies by 20-30 FPS.

- OS and browser updates definitely affect performance. Recently the network was left fully connected, and Firefox, Opera, and potentially Windows downloaded updates during test runs. This dropped performance noticeably.

As usual, feel free to make suggestions. Specifically, if there are particularly useful tests out there, I am more than willing to add them to the suite.
Tags:

Link | | Add to Memories | Share

From the Vaults of the A-Team: flot plugins

May. 6th, 2011 | 03:38 pm
mood: accomplishedaccomplished
music: CBC Radio 2

tl;dr From work on the War on Orange, I spun off three flot plugins: flot-axislabels, flot-hiddengraphs, and flot-tickrotor. Use 'em how you will & feel free to gimme feedback.

I'm going to step away from telling you about the A-Team's projects for a few minutes and talk about our by-products. Yup, software by-products. Think virtual horse-glue or electronic fertilizer. Well actually those comparisons aren't very good. Anyway, what I mean is that sometimes I overcome my natural laziness and package up bits of the work that I do that I think would be of particular benefit to others.

The War on Orange is all about statistics, and statistics are boooooring. Pictures, however, make the whole stats thing somewhat bearable, so the War on Orange makes extensive use of graphs. We decided on flot, a popular program for doing a whole buncha different plot types. Of course flot can't do everything, so it has support for plugins so you can add functionality without too much effort.

The first thing that we noticed was absent from flot was axis labels. We have graphs that show daily orange counts alongside the "orange factor"—oranges per test run—and we were using a second axis since the two stats are orders of magnitude apart. Not sure why axis labels weren't available out of the box, since they seem to be a pretty fundamental part of a graph, but luckily someone had already started on a plugin. Alas, it only provided labels for primary axes. But the plugin structure was all there, along with some interesting hacks. I've had a github account for a little while but didn't really use it, so it was very exciting to get down to some hardcore forking action. A while later, I had secondary-axis labels going in my flot-axislabels fork:



In the spirit of github, I submitted a pull request so the original author could incorporate my work into the original plugin, but I guess he had lost interest and never accepted it. So as far as I know, my improved flot-axislabel plugin is still the most fully featured one out there—although it does have a bit of weird behaviour sometimes as a side effect of the hacks needed to fit the labels in. Btw I accept pull requests...

The War on Orange has been going on for some time, and all the information we were trying to cram into our graphs starting making them feel cramped. Experience modifying flot-axislabels gave me the courage to create my own plugin to solve this problem: flot-hiddengraphs. This plugin allows you to hide and show the various graphs on one plot via the legend:





I made some interesting discoveries while working on that plugin, including the fact that mouseenter and mouseleave don't seem to always fire. Maybe if I weren't so lazy I'd fix it to use mousemove. Oh and it's still a bit ugly and I dunno why I have this fascination with links in square brackets. Did I mention I accept pull requests?

Well now that this plugin thing was old hat, I had to get creative to continue to ensure my life as a software developer was still painful. We've got a graph that can have quite a lot of columns (whether this is the right kind of display for this data is another matter). While conducting a different war, the Battle to Understand BIRT (aka BIRT Y U NO LIKE ME?), I stumbled on a nice control that allows you to rotate tick labels, so you can fit more, and longer, labels in. I started with the same hack as flot-axislabels to allocate some space for the labels... but how much space? Well I've never worked in graphics (to which my UIs will attest), and ten years is ample time for formal education to abandon me, so I couldn't even think of the word trigonometry at first. But Google knows all, and a short while later I was all Math.sin() and Math.cos() and Math.PI. Felt good to know that a few more university dollars paid off. So now the universe has flot-tickrotor (making up for a string of boring project names).



Now's the part in which I tell you what sucks about it: I had some problems with the allocation of space (seemingly the hardest part of these plugins) for long labels slanted down and to the right, and 'cause I'm lazy and think down-left-slanting labels look better anyway, I left it out. Automatically scaling fonts would be teh aw3som3 as well. Pull requests: I accept 'em.

So yeah, please use them and tell people about them and complain to me when they don't work and then send me pull requests when I tell you I'm too lazy to fix them.

If you're still reading, maybe you care about some of the interesting bits (read: sublime (in the Schopenhauerean sense) hacks) in flot plugin development. The main one I had to contend with was, as I've mentioned above, the allocation of space for new or bigger elements. As the code comments in the original flot-axislabels state,


This is kind of a hack. There are no hooks in Flot between
the creation and measuring of the ticks (setTicks, measureTickLabels
in setupGrid() ) and the drawing of the ticks and plot box
(insertAxisLabels in setupGrid() ).

Therefore, we use a trick where we run the draw routine twice:
the first time to get the tick measurements, so that we can change
them, and then have it draw it again.


What that comes down to, I figured out after a while, is that there's no way to tell flot "hey make the graph itself smaller 'cause I got stuff to put in the margin", since you don't know how big the graph is going to be until the plot is drawn. So a plugin that wants those margins to be bigger needs to do some calculations based on the standard size, set the label-dimension options appropriately, then trigger the draw event a second time. Now you've got spacier margins and can insert your elements. Note that this actually seems to be invisible to the user; I guess the first and second draw events happen before anything is actually displayed.

Unfortunately, this approach can screw over other plugins that also want to put stuff in the margins. flot-axislabels is actually okay because it is just allocates a bit more space and doesn't replace anything. But flot-tickrotor replaces the labels entirely... oh wait, maybe I can fix it if I display the labels in the first draw and then just calculate how much bigger the labels will have to be and ugh man this stuff is tedious. Anyway for now, if you use both, make sure tickrotor is loaded first. Because you're sick of hearing it, I'll make up a French version: j'accepte les demandes de tire. Oh hey there's a French github... demandes de "pull"? Pah, how unoriginal.
Tags:

Link | | Add to Memories | Share

Autolog

Mar. 21st, 2011 | 05:32 pm
mood: curiouscurious
music: "African Doctor" - Toots and the Maytals

The A-Team is embarking on a new initiative, and we need your help! After all, the A-Team's customers is Mozilla at large, and we like to keep our customers happy.

The project this time is Autolog. It's intended to be a generic tbpl-like results viewer for all the various projects that have test suites but aren't part of mozilla-central and the related branches.

We've already got a good start on the back-end: we're using an Elastic Search database to store results, and we're serving them up, and accepting new results, via a RESTful interface.

But now's the hard part: the UI! As we mentioned, the original concept was a tbpl-like interface, something clean and easy to scan. But tbpl is tied tightly to tinderbox, so it isn't easy to extend. We've spent some time starting at the code, and it looks like some extensive modifications would be in order, and they wouldn't necessarily make future extensions any easier.

Then we were told about an alternate to tbpl, asuth's ArbPL. This was designed to be extensible and has some neat features: it tells you what area has been changed (e.g. "Accessibility: Tests", "Layout: C++ Code"), it displays some details of failed tests automatically so you don't have to click on the failures first, and, for Mozmill, it has some very pretty stack traces and other information (example from the Thunderbird tree).

To be brutally honest, though, the A-Team is biased towards tbpl's look, both because it's the current standard and because it's cleaner. ArbPL has some very nice features, though, so it might be worth the effort to implement a tbpl-like interface, as time consuming as that might be.

But in the end, it's YOU, the customer, that is important to us. So let's hear it: do you like tbpl? What do you think of ArbPL? Is one much better than the other? Are there aspects you like of one and wish were in the other?

For the linkophobic, here are contemporaneous screenshots from tbpl and ArbPL (click to embiggen).



tbpl




ArbPL
Tags:

Link | | Add to Memories | Share

Mozilla!

Mar. 21st, 2011 | 02:35 pm
mood: accomplishedaccomplished

Any of the handful of my readers may have noticed a bunch of fairly technical posts lately, tagged 'mozilla'. That's 'cause I started working there a few months ago, and in the spirit of openness, one of their core values, they really encourage blogging and other forms of communication. So my mozilla-tagged posts are being syndicated to Planet Mozilla. If this stuff bores you, feel free to skip any such tagged posts--I'll probably be posting non-Mozilla stuff about as much as usual, meaning once or twice a year. :)

Link | | Add to Memories | Share

nginx and flup

Feb. 22nd, 2011 | 12:05 pm

I hope this will be helpful to someone, someday:

If you're using flup to serve fastCGI to nginx, don't enable multiplexing!

Background:

The Bugzilla Dashboard uses web.py as its back-end, for authentication, caching, and storage of divisions, teams, and members. web.py, in turn, uses flup, a relatively popular CGI, WSGI, and fastCGI library. It was relatively easy to write my back end, and hooking it up to nginx wasn't too bad either.

However over time, I noticed the flup would crash, not being able to create a new thread. Looking closer, my app (which runs in a separate process thanks to fastCGI) had many, many running threads (I forget the exact number, but after an hour or two it was well over 100). There were also quite a few "connection reset by peer" exceptions thrown by the flup code responsible for cleanup--so likely the client was occasionally hanging up just a bit before flup.

Switching to a forked, multiprocess model didn't help the situation--the machine ended up with piles of process instead of threads. Out of desperation, I tried disabling multiplexing, the interleaving of fastCGI requests on one connection. Poof, all my extra threads disappeared, and everything appears stable!

The worst part of this was that the web.py code defaults to turning on multiplexing:

def runfcgi(func, addr=('localhost', 8000)):
    """Runs a WSGI function as a FastCGI server."""
    import flup.server.fcgi as flups
    return flups.WSGIServer(func, multiplexed=True, bindAddress=addr, debug=False).run()


This is despite what flup says:

        Set multiplexed to True if you want to handle multiple requests
        per connection. Some FastCGI backends (namely mod_fastcgi) don't
        multiplex requests at all, so by default this is off (which saves
        on thread creation/locking overhead).


Luckily it was easy enough to disable multiplexing by writing my own runfcgi function.

So either nginx does support multiplexing but improperly, or, more likely, there's a bug in flup (probably in the MultiplexedConnection class, since this problem occurs with both the threaded and forking servers--I bet some request threads aren't being dealt with if the connection terminates suddenly). At some point I want to take a closer look, but for now, beware if you using nginx with flup!
Tags: ,

Link | | Add to Memories | Share