March 17, 2011

Machine Learning study group in Copenhagen

We just ended the founding meeting of a Machine Learning Study Group here in Copenhagen. We're planning to meet approximately once a month to discuss machine learning algorithms. In between meetings there's a Facebook group where we hang out and discuss our current pet machine learning problems and a Google Docs repository of machine learning resource (still to come).

Here's how we're going to run the group: The goal is to get smarter about machine learning. We're narrowing down on a couple of stacks we care about. Big data using databases - we're going to have a look at Madlib for Postgresql. Or something based on Map/Reduce. Some of us occasionally work in Processing using OpenCV - which is neat, because it's also a good choice on the iPhone/in XCode/Openframeworks.
Some of us are probably going to be looking at Microsoft Solver Foundation in C# at some point. Some of us might be looking at implementing algorithms for one of these environments if no one else has.

We'll be building out our knowledge on configuring and setting up these stacks. The idea is for everyone to work on a pet problem with a machine learning element. We'll share our ideas on how to proceed, share how to approach the stack of software.

The typical approach to a machine learning problems involves

  • Identifying the problem - figuring out where machine learning is relevant

  • Working out how to get from problem to symbols - i.e. how to turn the data inherent in the problem into input for the textbook algorithms. As an example, in image processing this means coming up with some useful image features to analyze, and figuring out how to compute them efficiently. This can be really tricky, and we expect a lot of the discussions in the group will be about building out our skills in this area.

  • Picking the best algorithm - we'll be studying the core algorithms. Not necessarily implementing them; but learning about what they mean - and how they behave

Just to give a little landscape: The founder members are primarily interested in the following spaces of problems: Image analysis - with a little time series analysis of sensor data thrown in for interactive fun - and "big data", running machine learning algorithms on web scala data.

If you're into this problem space, and this style of working - project driven, collaborative and conversational - feel free to join. If there's simply something you'd like to be able to do with a computer, feel free to join as well - maybe one of the more technical hangers on will take an interest in your idea and start working on it.

Posted by Claus at 12:07 AM | Comments (0)

July 17, 2009

Creating websites with builtin APIs

Whether or not its RESTian kosher I don't know, but the live Apollo 11 transcript is built natively around a JSON API. If you don't like mine - feel free to make your own.

Details: The API is at

- note the jsonp parameter, which lets you call the api from JQuery and similar libraries without cross site worries. Don't need JSONP. Don't know what it is? just use

instead. If your javascript api needs the parameter, the doc will explain how to use it.
The output is quite simple: There's a 'now' parameter indicating server time (seconds since unix epoch), time of liftoff in same terms and notes from the log. The item 'current' is the latest transcript item before now. The item 'future' a list of the next events to occur.
The format of the events is an array with four elements: time, an id for the speaker, the actual text and a text representation of the time - relative to 'now'
If you build something using this script, please be conservative in how frequently you call the API - the future item gives you an idea of when it makes sense to call.

If something is unclear just view source on - which explains in source how to use it with JQuery, and gives tables describing the IDs of speakers.

This way of building websites: Presentation with HTML+CSS+JQuery and a pure API backend really appeals to me. Clearly there are fallback concerns for non-modern browsers etc. etc. but the separation of concerns is appealing - as is the instant availability of an API for other's to use.

Posted by Claus at 3:05 PM | Comments (0)

July 1, 2009

Getting to the Flash developers

It worked wonders for the ARToolkit to be connected to the Flash developer community. Maybe it will work for Arduino and hardware hacking as well - using the Netlab toolkit.

Posted by Claus at 4:49 PM | Comments (0)

May 18, 2009

Demo Dag - vil du være med?

Har du lavet noget der (næsten) virker, så kom og vis det - eller kom og se på de andres ting.
(English? Summary at end of post.)

Første gang: 9/6 kl 20:00. Sted: Egmont, Vognmagergade 11 (kom præcis, så vi kan lukke jer ind)

(location: Egmont HQ, Vognmagergade 11, Kbh K. Be there sharp at 8PM - we need to traverse access control. )
Sign up here (or by leaving a comment on the post)

Jeg ved ikke af at der findes en begivenhed med følgende plan i København: Man møder op med sin dims. Noget man har lavet. Noget der virker*. Man har 15 minutter til at vise det frem og så er der fem minutter til snak. Så er det næste projekt. 5-6 projekter per aften og så en ølpause Demoer er 7 minutter lange, med mulighed for dommerens overtid. Diskussion i pausen og bagefter. Tænk pecha kucha, bare uden slideshows.

Vil du være med til at lave en? Jeg kan ikke se nogen grund til begrænsninger af hvad projektet kan være - web, visuelt, auditivt, mekanisk, elektronisk, til køkkenet, søfarten eller soveværelset - udover nogle grundregler:

  • Det virker - det er ikke konceptdag, men demodag.

  • Don't tell us - show us Vi vil hellere se hvad den kan, end hvad du lavede den af.

  • Ellers ingen grænser Intet er for stort og intet for småt. Kom med dit million-dollar produkt eller din hjemmestrikkede, ørkenhærdede, touch-enablede pandekageopskriftlæser.

  • Gem dine slideshows, servicekoncepter og forretningsplaner til en anden aften

Datoen er ikke skrevet i sten. Jeg foreslår tirsdag som en god ugedag - fordi torsdage og fredage for ofte er optaget.
Læg en kommentar hvis du er interesseret. Skriv om du gerne vil se på, eller om du har noget i skufferne du gerne vil vise frem.

English Summary Want to participate in a (software or hardware) demo day in Copenhagen? June 9th. Only running code. Demos are 7 minutes (with negotiable slack) Other than that hardware or software, woodware or anyware, web or desktop, mobile or stationary - anything goes. Leave a comment if you're interested. Include whether you've got stuff to show or want to look.

* Om det er lavet af træ, chips, papir eller bits er op til dig. Det skal være lavet. (næsten) af dig. Man skal kunne demoe det. DET VIRKER er de magiske ord, der åbner døren op. Tvivlstilfælde afgøres af Deres ærbødige overdommer.

Posted by Claus at 11:27 PM | Comments (29)

March 13, 2009

How much abstraction does it take to run Twitter

On Twitter, @tveskov asks, how many layers of abstraction does it take to run Twitter.
This is a really hard question, probably intractable - if you want to include the physics and signalling of lasers through the fiberoptic cables that sends Twitter's data to my home. There are simply too many different places along the route where the platform relies on some level of abstraction and comprehension, to do a full enumeration.
Instead, let's tackle a simpler question: How many technologies/abstractions are directly visible in Twitter's source code. I did a view source and tried to find all the technology that I could see from the source would be in use to run Twitter.
The rule here is that it there has to be text in the source file that does not make sense unless I know the abstraction/standard/software/API I'm referring to below.

Here my best shot at the list (in order of discovery (by me, while reading))

  • xhtml

  • DTD's

  • XML namespaces

  • XML

  • CSS

  • Browser DOM

  • utf8

  • ico file format

  • png file format

  • gif file format

  • jpg file format

  • HTTP

  • RSS

  • ATOM

  • URI scheme

  • HTML4 (I think. Seems the "bookmark" rel attribute comes from there)

  • "nofollow" microformat

  • "hcard" microformat

  • DNS

  • Javascript

  • JQuery API

  • There is some kind of anti-bot posting implementation - but I can't tell what the API for that is like

Notably absent: The email standards. While obviously employed by twitter (as are many other standards if the API is considered), I found no evidence of email in the source of the logged in front page.

Posted by Claus at 5:53 PM | Comments (0)

October 15, 2008

Adobe AIR really rocks

To convert the Spotify DJ from a little fun personal hack to something real it was clear that I needed some "consumer grade" way to install the DJing client on the machines of would be DJs.
"Some way" turned out to be Adobe AIR. Adobe AIR can be used with Flash, Flex or just plain old web skills - html and Javascript.
No matter which route you go you end up compiling a one file deliverable that installs as a real desktop application on the system.

I have had no exposure to Adobe AIR prior to today - but armed with online tutorials I started out at around 6 PM and at around 11 PM I was done - with time for UI tweaking and a little webserver work required to make the app work.

Adobe AIR quite simply brings web speed to the desktop. And its a pleasure to work with too. The runtime is a browser, so everything just works. I like JQuery - and of course I can just plug that in.
I haven't done things "properly" - haven't signed properly, haven't used the upgrading framework and so on, but the fact that I can make a useful UI that interacts with the file system and a web server in 4-5 hours is astounding. And it deploys as a 65k file.

Posted by Claus at 12:23 AM | Comments (0)

June 23, 2008

News from Johnny Chung Lee

Awesome new videos from the Wiimote hacker extraordinaire Johnny Chung Lee. In this first video he is using the Wiimote again to orient oddly shaped display for adaptive projected displays. In another video here's projection control by automatic calibration of displays with build in light sensing. There's more.

Posted by Claus at 9:10 PM | Comments (0)

May 30, 2008

Going down

I really like Sam Ruby's post on scaling down. Scaling down is a key concern of mine too, which is why I'm making this the weekend that my "all the perl I need for a webapp in one use declaration"-module goes onto CPAN.
Microscopic apps with almost zero footprint are a huge idea, it's the luxury version of Jot, the smart wiki that got consumerised into Google Sites.

Here's another super sweet example of scaling down: Jquery can insert the content of a specific DOM element from an external webpage into a specific DOM element on the current page in just one line

$("#links").load("/Main_Page #p-Getting-Started li");

Note the space between url and # - it's not an anchor but a CSS id. Very nice.

Posted by Claus at 9:11 AM | Comments (0)

February 20, 2008

Professional screen scraper

Good Jon Udell post on data friction and the strange but understandable fact that data is usually only made available in human consumption form, not machine readable form - making screen scraping a profession..

Posted by Claus at 3:45 PM | Comments (0)

February 17, 2008

Genetic algorithm/travelling salesman

3D Travelling Salesman using Genetic Algorithms from Ryan Bateman on Vimeo.

It's intesting how erratic the development is - it's not like large swaths of the final solution are found early and stable for most of the animation.

Posted by Claus at 11:40 PM | Comments (0)

August 22, 2007

DTrace Tech Talk

Bryan Cantrill, creator of DTrace, likes to debug, as far as I can tell from this Google Tech Talk. I always loved debugging, the detective work is engaging and the sense of reward when you win is great, but after watching I feel suddenly sad to spend all my time on DTrace-less platforms.

Maybe I should go Nexenta?

Posted by Claus at 12:18 PM | Comments (0)

July 14, 2007

Optimized Tour de France TV-watching

Here are some quick geek tips to optimize your Tour de France TV-watching beyond what the broadcast offers.

One of the annoyances of std (Danish) tour coverage is that the reporters suck with numbers. The coverage has improved lately, adding actually knowledgeable former rider Rolf Sørensen to the reporting mix, but still - we don't get the numbers we deserve, so we simply have to make our own.
The first thing we need to do is do our own timing - and why not use this convenient javascript stop watch (or your watch or your cell phone). Secondly, and this is where the reporters really suck, we need proper numbers for how fast the riders are going. The most useful numbers are time/distance instead of the usual distance/time. These numbers are both easier to observe, using the stopwatch, and more meaningful: Tour math is mostly about how much time difference a rider or group of riders can make up during the remainder of a climb or a race.
Third, to make sense of these times we need some sensible ideas about how fast the racers are actually able to go in different terrain. Here are some pointers observed during this years tour:

  • Peleton, chasing a group of riders for 30-40km: 50km/h

  • Solitary rider, long break: 40-45km/h

  • Sprint leadup - final 2 km: 55 km/h from -2 to -1 km and 65km/h for the last km

  • Sprinter - (this is actually not so relevant to know up to 80km/h instantaneous speed during final burst

  • Moderate climb (5% incline) 25km/h

  • Steep climb (10% incline) 15km/h
  • - what's moderate or steep varies immensely among riders obviously, see below

Below I've compiled a convenient conversion table from km/h to minutes/distance for various speeds and distances. If we use that to evaluate the info above we can tell e.g. that the peleton can gain 5-20 seconds per km on a small group og a single rider on a break, unless this rider is a time trial specialist, so 4-5 min breaks 30km from the finish line are non-events. Danish television used to have a huge problem with bogus excitement about such breaks, but fortunately this has changed a lot.
We also learn why time/distance is so much more telling than distance/time. While a speed difference of 20km/h to 25km/h doesn't sound very dramatic a time loss of 30 seconds per km travelled sounds huge - especially at the bottom of a 15km hill. And these are exactly the kinds of differences that exist between good and great riders.

Geek-optimized tour watching involves timing the front groups as they pass landmarks and/or the official 20, 15, 10, 5 km gates, comparing these speeds to the incline exactly where they are. It then involves comparing time gains per minute to the peleton, figuring out from these numbers the speeds of the different formations on the road, figuring out time to the top, computing plausible total time gains losses. Checking relative performance along different parts of the route also. The many split times of the javascript stop watch are a great help here.

Mountain relevant speeds

km vs km/h101517202225303540

Flat road relevant speeds

km vs. km/h4045505560657080

Posted by Claus at 7:29 PM | Comments (0)

June 21, 2007


The Mono team pulled a 21-day 12-16 hour-per-day hackathon to implement Silverlight for Linux that they have dubbed Moonlight. 21 days with a group of good hackers doing double shifts and working on the weekends. Just counting hours, each contributor will have put in about as much work as a regular 9-5 employee does in two months. We're assuming the entire team is composed of clever people - only clever hackers care to work so much. And lets assume there's a handful of people working on it (seems reasonable from the task list). So about a years worth of quality development delivered in 3 weeks by a small committed team with no external constraints and full concentration.

Sounds about right to me. Miguel de Icazas war story blog post is a great read.

Posted by Claus at 3:51 PM | Comments (0)