MaCro Philosophy

AWS Prizes and Test Details
Deadline: 1st September 11:59:59pm AOE

We are rapidly approaching the half-way mark for the competition where we will be giving away $10,000 of AWS credits split between the top 20 entrants. The top entries on September 1st at 12pm AOE (anywhere on earth time) that have correctly filled out the signup form and opted in for the AWS prize will be awarded $500 of credits. To make your job easier, we are releasing a few extra features and also providing more information about the competition tests.

We have been excited by the movement at the top of the leaderboard and are extending the amount of compute we're spending on the competition as there's still a long way to go to get to Animal-Level General Intelligence. You can now submit more times a day, testing duration has been extended, and we have released (unsupported) the environment code. Even though some people have a head start, it should still be possible to claim one of the top 20 spots and one of our AWS prizes. At the time or writing you could squeeze in to the prizes with an agent capable of solving just 10% of the problems.


Entry checklist

Fill out the registration form to enter your team: Make sure to opt-in for any prizes you are eligible for and want to be included for. [Competition Form]
Register on EvalAI: Your team names and emails must match with the other form. [Register on EvalAI]
Submit your entry in time: There may be increased traffic before the deadline and we can only test a limited number simultaneously so be sure to submit your entry in advance. [Submission instructions]

Evaluation and Tests

The Animal-AI testbed contains 300 different tests split equally among 10 categories (see below). Each test is a hand-built configuration of the environment in which the agent has to try to obtain the best possible reward. The tests are all pass/fail based on the agent reaching a certain threshold reward. An agent that demonstrates it has the cognitive capacity (or understanding of the environment) that is tested for will always pass as long as it is reasonably efficient at getting to the food. The agent is expected to attempt to maximise its reward, but not punished for taking a few extra steps than necessary here or there.

For example, in the initial tests when we just want to see if the agent can get food at all, then the agent will always pass if it gets the food within the time limit, even if it takes until the last second. In the preferences category, if there is a test with a large green food (high reward) and a small green food (low reward), then the agent will always pass if it gets the large food within the time limit, but fail if it gets the small food (even though this has positive reward). If there is a very long path to the food and a very short path, then the agent will pass if it gets the food following the short path (even if it does so somewhat inefficiently), but fail if it gets the food after following the very long path.

There are no intentional trick tests and they have been designed as far as possible so that they will be easy to pass if the agent has the skill in question. For example, if there is a large food and a small food in the environment then both will be visible at the same time, otherwise the agent might only see the small food and justifiably go and get it. If there is a long path and a short path then the long path will be very long and the short path very short.

In some tests, such as those in the obstacles category, the food might not be visible without exploration from the agent. In this case the tests are designed so that there is enough time to explore and find the food regardless of the order in which exploration occurs. However, time is still limited. If the agent spends time redundantly exploring an open area or revisiting previously explored locations then it might run out of time.

For the AWS prizes we will be using the current testbed with each problem run a single time. Just submit your agent to the main (and only) competition track. For the final test (resources permitting) we will run the tests multiple times per agent and may add a few more if necessary.

Test Configurations

The tests are built using the same YAML files provided with the environment to build training arenas (see the github docs as well as the included examples). There is almost no limit to the kind of environments that can be created using this method. The objects can be resized, rotated, coloured differently and placed in almost any location within the arena. The wall object can form building blocks, and constructions like the house below are fairly easy to put together.

The house is just a combination of walls, transparent walls (windows), and a ramp. Note that this configuration contains more parts than almost all the tests in the competition. The example is not representative of the tests, but designed to show that there are many possibilities and that we can create many different kinds of experiment within the confines of the environment and inside the standard arena. If this configuration was included in the tests then it would need to have a long time limit and the threshold value set appropriately so that the agent isn't penalised for looking behind the house for the food.

We have simplified the possibilities for the tests by using certain colours for each of the environment objects. Full details can be found in the github documentation. Walls and tunnels are grey RGB(153,153,153) unless they are used as platforms, in which case they are blue RGB(0,0,255) (see below). Ramps are pink RGB(255,0,255). The other objects such as food and moveable objects only come in one colour and this is fixed for all the tests. Note the generalisation category breaks some of these conventions.

All tests are of length 250, 500, or 1000 steps. For the blackout category, blackout times used are either multiples of -20, or, if the lights will be turned out after a while, they first flicker off at multiples of 25 (starting at either 25 or 50) for 5 steps a few times. So, for example, [-20], [25, 30, 50, 55, 75] and [50, 55, 75, 80, 100, 105, 125] are all valid settings.

Forced Choice

A common practice in animal cognition tests is to present an animal with a number of choices and see which one they make. This could be turning over a cup, pointing to an object, or moving to a particular area of a room. After the animal has made a choice, the test stops, and its 'answer' is recorded. Stopping the test usually involves intervention from the human test-giver and can be triggered by many possible conditions. In our case, the test ends either when food is obtained or when time runs out, limiting the ways in which it could end early.

We have implemented forced choices in some of the tests by including platforms that the agent can move down but not back up. To make it easier, every such platform is a blue wall. Every time a blue wall exists in a test, then the agent cannot climb back up once it has gone down. In the example in the video (the configuration file for this is also on the github) the platform is set to a height of 0.5 so that the agent cannot climb it.

If the agent makes the wrong choice in the above example then all it can do is wait for the environment to time out and it will fail the test. If the forced choice platform wasn't included then the agent could move about randomly and eventually get the food.

Training Configurations

We included some example training configurations in the github. These are just examples to get you started and not meant to be representative of the problems in the tests or be able to get your all the way to a good solution. An agent that solves every possible configuration of the examples (which include a lot of randomly placed objects), would be off to a good start, but it wouldn't have all the skills needed (see below) to solve all the tests. We expect participants to design interesting configurations as part of their solution process and are excited to see what people come up with for training.

Categories and what they test for

Each category tests whether the agent understands certain elements of its environment or has certain cognitive or reasoning capabilities. We want to test for the kind of general common-sense understanding of the environment that animals have. The following shows the kind of skills or understanding that is tested for in each category:

1. Food:

2. Preferences: 3. Obstacles: 4. Avoidance: 5. Spatial Reasoning: 6. Generalisation: 7. Internal Models: 8. Object Permanence: 9. Advanced Preferences: 10. Causal Reasoning:

Good luck on your path to Animal-Level General Intelligence!

Back to top