March 17, 2011

Machine Learning study group in Copenhagen

We just ended the founding meeting of a Machine Learning Study Group here in Copenhagen. We're planning to meet approximately once a month to discuss machine learning algorithms. In between meetings there's a Facebook group where we hang out and discuss our current pet machine learning problems and a Google Docs repository of machine learning resource (still to come).

Here's how we're going to run the group: The goal is to get smarter about machine learning. We're narrowing down on a couple of stacks we care about. Big data using databases - we're going to have a look at Madlib for Postgresql. Or something based on Map/Reduce. Some of us occasionally work in Processing using OpenCV - which is neat, because it's also a good choice on the iPhone/in XCode/Openframeworks.
Some of us are probably going to be looking at Microsoft Solver Foundation in C# at some point. Some of us might be looking at implementing algorithms for one of these environments if no one else has.

We'll be building out our knowledge on configuring and setting up these stacks. The idea is for everyone to work on a pet problem with a machine learning element. We'll share our ideas on how to proceed, share how to approach the stack of software.

The typical approach to a machine learning problems involves


  • Identifying the problem - figuring out where machine learning is relevant

  • Working out how to get from problem to symbols - i.e. how to turn the data inherent in the problem into input for the textbook algorithms. As an example, in image processing this means coming up with some useful image features to analyze, and figuring out how to compute them efficiently. This can be really tricky, and we expect a lot of the discussions in the group will be about building out our skills in this area.

  • Picking the best algorithm - we'll be studying the core algorithms. Not necessarily implementing them; but learning about what they mean - and how they behave

Just to give a little landscape: The founder members are primarily interested in the following spaces of problems: Image analysis - with a little time series analysis of sensor data thrown in for interactive fun - and "big data", running machine learning algorithms on web scala data.

If you're into this problem space, and this style of working - project driven, collaborative and conversational - feel free to join. If there's simply something you'd like to be able to do with a computer, feel free to join as well - maybe one of the more technical hangers on will take an interest in your idea and start working on it.

Posted by Claus at 12:07 AM | Comments (0)

July 17, 2009

Creating websites with builtin APIs

Whether or not its RESTian kosher I don't know, but the live Apollo 11 transcript is built natively around a JSON API. If you don't like mine - feel free to make your own.

Details: The API is at


http://www.classy.dk/cgi-bin/apollo_transcript.pl?q=transcript&jsonp=somejsonp_prefix

- note the jsonp parameter, which lets you call the api from JQuery and similar libraries without cross site worries. Don't need JSONP. Don't know what it is? just use

http://www.classy.dk/cgi-bin/apollo_transcript.pl?q=transcript

instead. If your javascript api needs the parameter, the doc will explain how to use it.
The output is quite simple: There's a 'now' parameter indicating server time (seconds since unix epoch), time of liftoff in same terms and notes from the log. The item 'current' is the latest transcript item before now. The item 'future' a list of the next events to occur.
The format of the events is an array with four elements: time, an id for the speaker, the actual text and a text representation of the time - relative to 'now'
If you build something using this script, please be conservative in how frequently you call the API - the future item gives you an idea of when it makes sense to call.

If something is unclear just view source on http://classy.dk/moon - which explains in source how to use it with JQuery, and gives tables describing the IDs of speakers.


This way of building websites: Presentation with HTML+CSS+JQuery and a pure API backend really appeals to me. Clearly there are fallback concerns for non-modern browsers etc. etc. but the separation of concerns is appealing - as is the instant availability of an API for other's to use.

Posted by Claus at 3:05 PM | Comments (0)

February 4, 2009

Inputs available at Classy Labs

A quick rundown of the various physical interfaces I have for sensing information from the real world

  • Arduino kit sensors
  • Google Android (camera+accelerometer+compass+gps+touchscreen+buttons+trackball)
  • Wiimotes (accelerometer+buttons+IR-dots)
  • Wacom tablet (x,y,inclination X, inclination Y, button)
  • Mouse
  • Keyboard
  • Webcam

In short: Lots. I am going to build a compendium of how to talk to these things from environments I care about, which are mainly Processing, Puredata and then some previously nonexistent time series glue that I might have to write on my own.

Posted by Claus at 9:44 PM | Comments (0)

May 9, 2007

Video Vectorization now in color

(explanation)

Posted by Claus at 11:12 PM | Comments (0)

May 8, 2007

Video experiment

Shoot 5 seconds of bland video, use virtualdub to split into images, run images through potrace, use virtualdub to resequence to video and end with this (AVI). Yes, it's a work in progress. I have hopes of being able to extract parts (e.g. the laptop in this video) and composit with other stuff and put it back together again - but we'll have to see.
Inkscape uses potrace combined with pre-trace color separation and gets good results from that.

Posted by Claus at 1:18 AM | Comments (3)

May 1, 2007

Åbent bibliotek

Thomas Angermann rapporterer fra biblioteksstyrelsens og DBCs, det firma der laver det meste biblioteksteknologi herhjemme, respons på visse udfordringer, primært gennem projektet Brugernes Bibliotek. Svarene virker decideret nedladende på mig, Og iøvrigt misforståede i samme retning Angermann antyder. Desuden afvises kritikken fordi den kommer fra "ikke-brugere" af bibliotekerne. For min egen del (jeg har også pippet lidt med i kritikken) må jeg sige at det udsagn er absurd. Jeg har været en hyppig gæst, i barndommen stort set daglig gæst på kommunebiblioteker, skolebiblioteker, forskningsbiblioteker, institutbiblioteker og nu digitale biblioteker siden jeg lærte at læse. Jeg bruger dem mindre nu - men det er fordi de ikke giver mig de svar jeg leder efter.
For det andet er det lidt for selvgodt et argument: Selvfølgelig synes de brugere biblioteket har at det er besværet værd - eller var de jo holdt op med at bruge biblioteket. Men som softwareudvikler kender jeg skam godt argumentationen. Det er et superbrugerargument. Man designer til de der viser den største interesse for ens arbejde. Det er altid dem der er glade for ens arbejde - altså dem, der har vænnet sig til hvordan tingene er lavet. Det er næsten en naturlov at der ikke kommer nogen udfordringer fra dem. Til gengæld kan man være sikker på at jo mere man retter sig mod disse brugere, jo mere skubber man alle de andre fra sig.

- Til gengæld må man sige at Biblioteksstyrelsen tager faklen op. I et stykke tid har man kunnet få et drop af hele nationalbibliografien, altså det overordnede indeks for bibliotekernes materialer til fri eksperimenteren. Det er en lille smule uklart hvordan setuppet er. Man må ikke uden videre drive en konkurrende bibliotekssøgetjeneste. DBC er en kommercielt drevet virksomhed - er det meningen at de skal hoppe ind bagefter og hæve gevinsten af "ikke bruger nørdernes" ulønnede ekspertise? Det vil tiden vise. Jeg har ihvertfald bedt om min kopi.

Posted by Claus at 6:47 PM | Comments (0)

April 27, 2007

DBPedia and implied assumptions

We had fun with DBPedia the other night - but DBPedia is still a little confusing and rough around the edges (no snarkiness here - I think the project members think so too). I got an illustration of this when I had a look at the property set within DBPedia, the results of which are here. It was just a quick naive survey: What are the properties I can query and how distinctive/useful are they. Turns out most of the DBPedia set of properties are project local and, as far as I can tell, so far have very little structure other than being properties. Places and people have received a little modeling love, so that names, geolocation, birth and death make a little more sense than the rest of the data.
I think this should temper the the semantic web is here optimisim just a little bit. It is indeed nice to be able to filter by infobox-properties and to project down to specific properties - but it is hardly the arrival of another world just yet.
There's a lot of fun to be had coming up with discovery tools though - and for that reason alone the DBPedia project is great.

It's all about the "data => tools => data => tools" virtuous circle.

Posted by Claus at 10:40 AM | Comments (0)

April 25, 2007

DBPedia hackeaften igår

Morten har den bedste omtale af DBPedia hackeaftenen. Hyggeligt, forvirrende og oplysende. Det blev klart for os at DBPedia i sin nuværende form er lidt flosset i kanterne, men at potentialet er der.
Det er også sandsynligt at DBPedia vil give anledning til en masse datarens og konsistenschecks og det er jo heller ikke så skidt.

Vi fik nogen ideer til nogen hurtige tricks man burde lave med DBPedia for at gøre det lidt nemmere at undersøge datasættet og med lidt held er der også nogen af os der får gjort noget ved ideerne...

Posted by Claus at 2:40 PM | Comments (0)

April 10, 2007

DBPedia hack-aften?

Som man kunne læse forleden her på kanalen, så er der kommet en "semantisk" udgave af Wikipedia, DBPedia - en gigantisk samling af RDF assertions baseret på Wikipedias ganske omfattende data. Jeg ved alt for lidt om RDF og SPARQL og det semantiske web i praksis. DBPedia er en fremragende anledning til at få gjort noget ved det. Morten, som ved en masse om RDF, har heldigvis lovet at være guide til en

DBPedia hack-aften.


på ITU kl 20.00
d. 24/4 (det er en tirsdag)
Lokalet er såvidt jeg husker Marie Curie mødelokalet på 5. sal.
ADGANG: KA godt blive noget rod med at I skal lukkes ind af mig af en sidedør. Ring 22 90 18 86 hvis det fejler. Jeg prøver at skilte.

Program: Tag din laptop med.
Morten giver en DBPedia baseret RDF og SPARQL intro
Vi undersøger hvad man kan få ud af DBPedia sådan hands-on
Vi diskuterer goe interfaces til/anvendelser af de data der er
Vi fortsætter til vi ikke gider mere

Imity lægger hus på ITU - Jeg sørger for kaffe - og at der er en Linuxboks med fornuftige tools og en kopi af datasættet og iøvrigt netværk nok til de ca 10 mennesker vi har plads til.


Som startforeslag har jeg booket lokale m projektor til enten tirsdag d. 17 eller onsdag d. 18 april - altså tirsdag og onsdag i næste uge. Datoerne er yderst negotiable - f.eks. har jeg ikke diskuteret med Morten om han kan de dage - det kan sagtens flyttes en uge. Det er let at få lokaler.

Hvis du har lyst til at være med, og forslag til en bedre dato så kom bare an i kommentarsporet på den her post.

Posted by Claus at 12:56 PM | Comments (8)

February 27, 2007

CSound

I have been looking a little at CSound, because I wanted to do some musi-mathematical investigations and text formats always make for nice accessibility. A text format however is no guarantee for readability. Csounds looks like what you get if you try to construct a programming idiom without any knowledge of other programming languages. I know that sounds a bit harsh and I do think there are likeable features but there are so many strange things that are just unsoftwarelike in the language. Have a look at the sample in the Wikipedia entry for CSound, as an example. Let's begin to enumerate the strangeness:


  1. Unreadable shorthands for everything - I can understand this from a "it's better for serious users" standpoint though

  2. It's XML - but it isn't: The semantics are still in CSound's legacy ASCII format that must be parsed with a CSound parser. The grammar is simple - but it's still a grammar.

  3. In the instruments section arguments are separated by comma, in the score section just by whitespace (it's a syntax error if you mix it up)

  4. instruments, user-defined function-tables, notes in the score, built in functions: All are just recognized by numbers, not by names.

  5. The a1 in the instrument definition is a variable and it's scoped inside the instrument. The f1 and the i1 are really "f - and a first argument of 1" and "i and a first argument of 1" - it's just OK to have them as one token. These variables are global.

  6. The argument "10" in the f1 line is the name of built in generator function GEN10, a sine curve. The set of GEN functions is fixed.

  7. p4 and p5 in the instrument definition refer to argument numbers 4 and 5 in the "i1" line. You either have to go out of your way - or maybe you can't - name them

  8. The final 1 in the instrument definition refers to the f1 function. In this example. everything is in one file - but originally the score section and the instrument sections are in separate files. So the value 1 in the score file plugs into the instruments in the other file...

All of this makes sense as a kind of "minimal theory" language version of a physical device: "Plug the sine you setup with the first oscillator into the first generator" - software wiring. But without the visuals for orientation, these kinds of semi-physical coordinates are extremely confusiing.

Posted by Claus at 2:45 PM | Comments (0)