MaCro Philosophy
12-July-2019

AAI: Hand-crafted baseline tops the leaderboard
(11-July)

Yesterday morning I wrote a simple algorithm for entry in to the Animal-AI Olympics. At this very early stage it tops the leaderboard and beats all our preliminary deep reinforcement learning benchmarks. The agent follows purely hand-coded rules based on the simple idea that moving towards green and yellow things is good and that red things are bad and should be avoided (this is generally true in Animal-AI world). This submission, at the moment, would put you in line to win $8,000 worth of prizes and provides a competitive baseline for other, more powerful, methods.


The hand-coded agent on some randomly generated training problems

The idea is simple. The agent turns around until it sees a positive goal then heads towards it, turning to centralise its location. It also stops and turns away when it sees red. That's it! No machine learning or training required. It never even takes into account the reward (we know it and we're coding this information directly into the algorithm). This agent solves over one quarter of the problems in the testbed (28.33%). It doesn't solve any of the interesting problems, but it takes down all the easy ones and gets lucky in a few of the harder ones.

Results

Here are a selection of baselines with the new hand-coded agent included. The PPO baselines are simple deep reinforcement learning agents trained using the examples and configurations provided in the competition repository and are not optimized in any way.

Agent C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Total (%)
Hand-Coded 21 15 0 8 3 15 12 0 11 0 28.33
PPO-Obstacles 18 13 4 7 2 5 9 2 4 0 21.33
PPO-SpatialReasoning 10 6 3 3 1 5 7 1 6 2 14.67
PPO-Food 5 4 4 3 0 2 2 2 0 1 7.67
Forwards (.8) or Left (.2) 4 0 0 2 0 5 5 0 0 0 5.33
Random 2 0 0 3 0 0 0 0 0 0 1.67
Me (using higher resolution and
having designed all the tests)
30 30 30 30 30 30 30 30 30 30 100

We know that some of the more complex problems in the competition can be solved by very simple processes, but it will only be when an algorithm can robustly solve all the problems of a certain subtask that we can think about attributing higher psychological processes to it. For example, if I asked you to guess the suit of a card I'm thinking of (clubs, diamonds, hearts or spades), you should get it right about 25% of the time. It would only be if you could get it correct closer to 100% of the time that I would start to entertain the idea that you had some kind of mind reading ability. So 28.33% doesn't mean that much when it comes to progress towards animal cognition, but it is a good baseline to aim for, and a simple challenge for deep learning, or other sophisticated approaches to measure up against.

Morgan's Canon: In no case is an animal activity to be interpreted in terms of higher psychological processes if it can be fairly interpreted in terms of processes which stand lower in the scale of psychological evolution and development.

Morgan's Canon also applies to AI systems, especially in the Animal-AI environment where we are purposefully trying to emulate animal-like behaviour. In what follows we will see multiple cases of what looks like intentional, goal-directed behaviour that it is tempting to explain in terms of higher psychological processes, but that all are produced by the simple algorithm detailed below.

Submitting the Agent

I will not provide all the details of the gane here. Instead, I will walk through the general idea and leave it as an open task to recreate, or improve on it. We just need to modify the provided agent.py file in the examples/submission folder of the Animal-AI directory. If you want to follow along then you will need to follow the instructions on the github to set up the Animal-AI directory and environment. We only need a very simple file to work with as we're not going to use any machine learning and therefore don't need any of the ML-Agents toolkit.

First, we need to identify when the agent can see a goal it should approach. To do this, let's take the pixels and search them for anything matching the colour profiles of the goals. The following method takes as input the array of RGB values (multiplied by 255 as they're scaled to (0,1) in the observation) for the pixels and a single RGB array that matches the colour. From experimentation, we can use Green ≈ (129,191,65), Yellow ≈ (100,65,5) and Red ≈ (185,50,50).

Because of the different lighting and shadows in the environment we don't want to return pixels that match our approximate values exactly. Instead we look for any pixels that fit in a range or either within 20% of the given value or ± 25 (whichever is more generous). There are better ways to perform this calculation, but for our purposes this will do for now.

We don't want to just head forwards when we see pixels of the right colour, but head in their direction. A crude way to do this is to see if there are, for example, more green pixels to the left or to the right. If there is more to the left, we head forwards and left. If more to the right, we head forwards and right. Assuming there is only one green object ahead, this should lead to the agent steering towards the object. This is enough by itself to get the following behaviour:


The hand-coded agent and moving food.

Finally, we should avoid red objects (orange objects are completely ignored by this agent - in fact - anything that is not green yellow or red is completely ignored, that is why it does so well at the generalisation category). I included a few little tricks to improve its behaviour. For example, if red is only on one side of it, it will move forwards instead of turning or backing away. It also stores its previous action so that it always continues turning in the same direction if it cannot see anything. It's not perfect of course, but can occasionally lead to the very intentional seeming action as can be seen below.


The hand-coded agent and a problem from the training config 4-Avoidance.yaml.

Putting this all together and implementing it as part of the step method, we get the agent shown in the videos. It can be entered into the Animal-AI Olympics by building a docker image using the provided DockerFile in the submission folder. Simply run 'docker build --tag=myAgent .' from the submission folder and then (after creating an account, officially entering the competition by filling in this form, and following the instructions on EvalA to add your account token) upload to EvalAI with 'evalai push myAgent:latest --phase animalai-main-396' and it will automatically run the agent on the tests and return your score.

Improving the agent

This is only a quick attempt at creating a food retrieval agent. As can be seen from the results, it works in simple situations, but fails as soon as obstacles are placed in the way. Of course, it never solves a series of problems that would suggest it has real cognitive skills, which is the real aim of the challenge. It is, however, nice to see it achieving some of the tasks with just a few lines of codes, and is testament to the powers of encoding domain knowledge directly.

The agent could easily be expanded with slightly more interesting rule-based behaviour based on the same method of only focusing on particular coloured pixels in the environment. It could also benefit from a memory lasting more than one action. As soon as the food is out of sight, it is as if it doesn't exist. Perhaps the most interesting use for this agent would be to bootstrap reinforcement learning algorithms. This could either be done as a way of improving initial policies, or the breakdown of the pixels by colours could be used to give more domain-relevant structure to the inputs to the agent.

Ultimately, a simple reactive agent like this with no memory or planning capabilities will have no way of solving any of the more complex tasks, but it may still be able to scaffold initial learning. Exactly how to make a better agent, we'll hopefully find out during the competition.


Back to top