Sikuli: scripting with screenshots
Ian Wrigley discovers a brilliant new scripting tool that uses screenshots to perform commands
I came across a fascinating open-source project the other day that promises to make life just a little bit easier for me, and perhaps it will for you too.
It’s a totally cross-platform automation tool called Sikuli from the User Interface Design Group at MIT, with a really cool user interface (which isn’t too surprising, given its august provenance) and the ability to script things that have been hitherto unscriptable.
One of the main problems with scripting systems such as AppleScript is that they require at least some co-operation from the application itself. In the case of AppleScript, the application must implement and publish a scripting dictionary, while tools such as Visual Basic can only do so much and they have great difficulty with predominantly graphical user interfaces. The open-source Sikuli aims to solve that problem, focusing very much on enabling you to script even a totally graphical UI.
To demonstrate just how neat it is, take a look at the screenshot below, which shows a script I created to start playing iTunes:
(Click image to enlarge)
You’ll notice it isn’t a typical scripting language, because there are actually graphical elements right in there among the scripting commands. That’s where Sikuli differs from other scripting languages I’ve seen: you define the actions it should perform by screen-grabbing the interface elements you want it to manipulate. On a Mac, this grabbing function is mapped to Command-Shift-2, and when you press that key combination the screen will darken so that you can select a region, at which point the region is pasted into the IDE (integrated development environment).
Let’s step through this script line by line:
1. The first thing I need to do is move to the Space where I have iTunes running. OS X employs the concept of “Spaces”, or virtual screens, and I have iTunes running on my 10th Space, which I can get to by hitting Ctrl-0, so I tell Sikuli to type that. Then I switch to iTunes to make it the foremost application.
2. Now I want to make sure the volume is all the way up. Although Sikuli can be made to drag elements such as the volume slider, I had problems making it work in all cases – the slider seems just too small for it to recognise reliably. So I used the alternative method, which is to increase the volume using Command-Up Arrow, and I need to do this several times to make sure the volume is at its maximum, so I just loop around and do it 30 times. (If you’re a Python programmer, you might recognise that WHILE loop construct: it turns out that Sikuli Script, the scripting language Sikuli uses, is based on Jython, a Java implementation of Python, so all the regular constructs such as loops are available.)
3. Once I’ve turned up the volume, I need to make sure that the iTunes DJ playlist is selected. Again, though, my initial attempts weren’t terribly successful. My first version of the script just had Sikuli click directly on the iTunes DJ icon, but if that icon was already selected it would be highlighted, and the graphics recognition engine couldn’t match it because I’d asked it to click on the unhighlighted version. There may well be an elegant workaround for this that I haven’t discovered – it’s still relatively early days for me with Sikuli – but I opted for a brute-force and ignorance approach (always my favourite) by having Sikuli first click on a playlist I knew couldn’t possibly be selected, namely Ringtones, and only then to click on iTunes DJ.
4. Finally, I have the script click on the Play button, and the music starts. I saved the script to my desktop as an executable, and now any time I want to start iTunes I just need to double-click on it.
Sikuli is a clever piece of software. At the heart of it is a graphics recognition engine that finds the pictorial element you’re looking for whenever the script runs. You can fine-tune it by adjusting the “sensitivity” of the matching – if you double-click on an image in the IDE, a window will appear that shows you the current screen plus all the regions that your captured image would match, and by moving the sensitivity slider you can adjust it until only the area you’re interested in is selected.