Recent Changes - Search:

short cv

ASNakedGene

Thumble

Zoomadic State

On the Same Page

edit SideBar

Research Method

ZoomADIct (homepage and abstract), Research Problem

A Brief Outline of the Initial Approach

Image analogies (or a general method of creating image processing filters using training sets of example input/output data pairs) may be implemented as an evolving population of competing artificial neural networks. Unlike the existing super-resolution software, neural networks trained to "infer" the missing high-resolution data could generate new detail (see a simple demonstration of this distinction). In particular, each neural network in the population would have a characteristic manner of "filling in" the missing visual information. If subsequently plugged into an interactive process of cumulative aesthetic selection, the population would evolve to generate high-resolution "interpretations" of the input image and effectively adapt to the user's expectations about the image "behind" the thumbnail. The target solution could therefore be sketched out as follows:

  • The user stumbles upon a promising thumbnail image while browsing the Internet, copy's the image to the clipboard (by right-clicking, or Control-clicking on it and choosing from the pop-up menu available in most browsers), then launches the ZoomADIct application.
  • ZoomADIct loads the thumbnail from the clipboard and feeds it to the initial population of (already highly trained) neural networks.
  • With each 200% enlargement step:
    • While the user is motivated to pursue the "ideal" 200%, scaled up version of the thumbnail:
      • With each neural net in the population:
        • net creates a blank high-res image (200% the scale of the thumbnail).
        • With each pixel of the thumbnail:
          • net scans the local graphic context in the thumbnail as its input vector.
          • net generates output in response to the context.
          • Colour values of the four pixels in the highRes image corresponding to the pixel in the thumbnail are set to the values specified by the output vector.
      • The resulting high-res images are displayed to the user to make a selection from.
      • If the user selects only one of the highRes images:
        • The thumbnail is set to the chosen highRes image.
        • All other data is discarded and ZoomADIct moves to the next 200% enlargement step.
      • Else:
        • The parent population of neural networks are set to the networks that generated the chosen high-res images.
        • All other data, save for the thumbnail, is discarded.
        • ZoomADIct's custom set of evolutionary operators are used to regenerate the neural network population with variations of the networks in the parent set.

While hopefully it is usefully descriptive, the pseudo-code in the table above is also rather trivial in the sense that many of the steps are fraught with problems. Some of these issues are briefly addressed in the reminder of this section.

A Note on the Coevolving Aesthetic Fitness Function

Evidently, the sketched out target solution would represent a novel kind of meme breeder — or a user-driven computer implementation of cumulative (aesthetic) selection. In his introduction to the workings of his Blind Watchmaker program (often cited as one of the original inspirations for much of a-life art today — see ref.), Dawkins (ref.) draws attention to the inconstant nature of selection pressure when one works with human choosers. For his purposes he called it appropriately enough: "human whim" (ibid.). On the other hand, such "inconstancies" are not unusual in that many evolutionary phenomena (such as runaway selection) in one way or another reflect the fundamental fact that the more potent is the change in the genotype, the more will it affect its own fitness landscape down the line, through its incalculable effects on the environment. In the present context, the relevant notion of the environment is the user's aesthetic universe. The user's apparent capriciousness, therefore, reflects meme breeders' end product — not a problem — which is (ideally) a self-exploration of aesthetic attitudes, of which the visual, aural, or textual output is just a frozen residue of the path taken by the software user through the space of possible aesthetic/attitudinal worlds s/he can adopt as one's own.

Super-resolution via Neural Network Based Image Analogies

By the end of my doctoral research project, the study has generated thousands of images some of which I was invited to show at several venues as digital prints. This however turned out to be a problem as all my images were generated by experimental pieces of software which for convenience were invariably set to operate at very low resolutions — i.e. all the images I had were between 128 and 512 pixels wide! To print these large, therefore, it was necessary to somehow boost the resolution of the files. The simplest is called the "nearest neighbour" method according to which the desired resolution is achieved by representing each pixel in the original image as a square field of flat colour. The result, of course, is the characteristic "digital" or "pixelised look". However, one of the main aims of the project was to demonstrate that computer implementations of cumulative aesthetic selection do not as a matter of principle (but only as a matter of bad design) trap the user into the aesthetic prison of trivial digital effects. The "pixelised look" would thus hardly help the cause. And there were similar problems with all other standard interpolation methods...

I then looked at the available specialised software. Unfortunately, the circumstances were asking to much even of these excellent tools — no software was designed to take an image from 128x128 pixels to a file that when printed out would cover square feet of surface at resolutions of more than 300dpi. More seriously still, each piece of software left its own "aesthetic signature" which then becomes an invariant, yet incidental, feature across even very different digital works. This last issue was the greatest difficulty at the time, but also provided the germ of the idea for the project proposed here:

Instead of developing a standard super-resolution tool for use with all images, one could evolve a whole population of alternative programs specifically adapted for this one image — and this one user — at a time.

Of course, the reason it was so difficult to recover high-resolution data in the case of my digital images is that there is no such data! And even if there was, it would be, strictly speaking, irrecoverable (in the absence of any further information). There are, for all intents and purposes, infinitely many possible high-resolution digital images "behind" every thumbnail (the number of possible high-resolution images that a thumbnail represents with equal veracity grows with the square of the scaling factor). One could easily, for example, design a program that would generate a random image for each pixel of the thumbnail, with its colour value used to set the sampling probability distribution of colours in the image. Tiled together, such images would make up a composite image of arbitrary size that would nevertheless be identical to the thumbnail at the original scale. If left running, the program would go on to churn out such composite images for ever, most likely never even once repeating itself. Of course, the question is how many of these large images would also be "aesthetically distinct" (i.e. distinct in the sense that the difference would actually matter to us in some/any way). And indeed, we would typically not be able to tell them apart — they would all be "just random" elaborations of the original thumbnail image.

This disparity, between the space of distinct digital images and the space of aesthetically distinct digital images, is at the core of "lossy" image compression, as well as its inverse, the study of super-resolution. In particular, the goal of super-resolution is an application that automatically generates an aesthetically satisfactory high-resolution "guess". Because such software is expected to be used on a great variety of images, it is typically designed to be very conservative. As a result, no new aesthetic features are to be generated (i.e. if one starts "zooming in" on a low resolution image of a face using super-resolution software, one will not arrive at a point where individual skin cells will be visible or anything else for that matter). For the present purposes this well founded conservativism is unnecessary as there is only one image to worry about, and in fact, the application is to adapt to it, and the user, by actively "inventing" aesthetically relevant features at higher resolutions (if you haven't already — see the first paragraph above — you may want to check out a simple demonstration of this distinction).

The work of Hertzmann et al. (ref.) represents the most intuitive response to the problem of filling in the missing data: Look at how similar perceptual problems are resolved elsewhere and apply some of that wisdom to the present case. The image below, for example, was just so colourised by a neural network trained on the example of a child's face (using my own version of Hertzmann's method; though, his solution is not in fact based on neural networks).

For the present purposes, this general approach is envisaged to provide highly trained networks to make up a wide ranging initial population. This would save a great deal of the user's attentional resources without pre-empting the direction of evolutionary change. Both of these issues are paramount to meme breeders in general (ref.). And so, a consistent effort will be invested so as to not make any unnecessary programming decisions in advance of the user, since any choice made at the point of coding may potentially leave a characteristic signature at the point of use, thus limiting the aesthetic scope of the solution. For this reason, the very neural architecture (including even the number of input neurons!) should be evolved, as well as the connection weights. The example below shows a graphical interface I developed while studying the possibility of using such networks.


Examples of MyAI networks whose architecture as well as weights are open to selection pressure and MyAI's custom evolutionary operators. Here shown are a XOR network (left) and a SIN network (right). The small graph on the right of each network maps the actual network output (black dots) against the target/desired output (red dots). Blurry pink spots represent input neurons, yellow stand for output neurons, and grey for hidden or inter- neurons. Arching lines ("axons") represent connections between the neurons, and their intensity indicate the weight of the connection. The sign of the weight is indicated by colour: red for negative and blue for positive. The direction of the connection is indexed by a gentle "dip" into the post-synaptic neuron.

The difference, on the other hand, between the space of distinct digital images and the space of aesthetically distinct images diminishes with scale (largely because not many things can be represented by a few pixels). In other words, at low resolutions even a "natural image" becomes increasingly difficult to distinguish from just a random 2D array of pixel values.


Four 128x128 px "natural images" (top row) are here scaled down, using Adobe Photoshop's Bicubic algorithm, to 8x8 px (middle row). The bottom row of 8x8 px images (below the line) are generated using Java's Math.Random() class.

Suppose, looking at the above example images, that a neural network was evolved to regenerate the original 128x128 px image, say, of Judy Garland (is that Judy?) from its 8x8 "thumbnail". Admittedly, this could only be achieved by a set of separate networks (or equiv.) adapted for the purposes of scaling up Judy from the initial 8x8 image, over every stage of at most 200% enlargement (i.e. 8, 16, 32, 64 px), right through to the original 128x128 px size. Judy would therefore require at least a set of 4 networks (or an equivalent large modular network) for its regeneration. Still, suppose one had such a network set. The question is what would it "regenerate" if fed not 8x8 Judy, but one of the four random 8x8 pixel arrays on the next row? This opens up the possibility of evolving the thumbnail image as well (see ASNakedGene) with or without fixing the evolution of the networks.

Interestingly, as a final point of this brief account, the proud user may well upload her/is evolved interpretation of the thumbnail to an Internet web page. And because it is a high-resolution image, it may well be indexed by a thumbnail of its own, which, if the same size, would then be identical to the original thumbnail — i.e. the original thumbnail would thus reproduce itself on account of its own aesthetic appeal, not the high-resolution image it indexes!


An example output of the preliminary study. The original image on the left was reduced to 16x16 pixels and then "zoomed in" on to the original scale by a neural network trained on the example of a completely unrelated image. Evidently, the network came up with a very peculiar interpretation of the "thumbnail". With ZoomADIct, a whole population of such networks (though, hopefully far more advanced) would be presented to the user to choose from.

ZoomADIct (homepage and abstract), Research Problem


Edit - History - Print - Recent Changes - Search
Page last modified on April 12, 2008, at 10:27 PM