N.B.: The challenge is open to everyone, however only workshop participants will be eligible for prizes.To participate in the data challenge, either register for the event, or send an email to the organisers with your name. In the end we will invite everyone to give a short summary slide of their approach, which will then be discussed at the workshop and distributed to non-registrants. Following the end of the workshop, we will make the labelled test set available.
The setting
Britain’s oldest city, Colchester, was founded by the Romans around 40AD as a barracks on the site of a Celtic stronghold. Throughout the 2nd & 3rd centuries, the city expanded, eventually becoming a colonia -- an extension of the city of Rome. Roman landmarks remain to this day, with more discoveries still being made
Whilst performing some construction, workers uncovered indications of ruined walls in a site that was previously thought to be empt. You have been brought in to scan the site using muon tomography and to map out the locations of the walls to help aid the archeologists. You have been provided with a set of simulated data on which to develop a suitable inference algorithm.
The physics
Muons are naturally produced when cosmic rays encounter the Earth’s atmosphere. Whilst muons are able to pass easily through matter, they nonetheless undergo small changes in their kinematic properties. Muon tomography allows us to produce 3D images of the site of interest, by analysing such changes in muon kinematics.
The amount of change (scattering) depends on the radiation length of the material through which the muon passes (X0 [m]). Denser, heavier materials generally have a lower X0, which in turn leads to greater amount of scattering.
The dataset
The simulated dataset consists of 3D images of randomly generated volumes of interest (VoI), and are split up into voxels. The VoIs are generated to represent a series of small stone walls buried underground and surrounded by soil. For each voxel, we have an estimate of the X0, which was obtained using a simple and standard algorithm (Point of Closest Approach -- PoCA). The X0 prediction is a single positive real number per voxel with units of metres.
PoCA, however, produces inaccurate and blurred images, and with the risk of damaging the ancient artefacts during excavation of the site, you are tasked with developing a dedicated algorithm to refine these rough predictions -- one which can accurately determine whether each voxel is soil or wall.
You are provided with ~100,000 samples which are accompanied by a map of the true layout of the walls (training data). Additionally, in order to evaluate the performance of your algorithms, you are provided with ~30,000 samples without the true layouts (testing data).
The task
Your task is to develop an algorithm that can convert the 3D X0 images into a 3D map of the locations of the walls and soil voxels, i.e. for every voxel it should output either a 0 (soil) or 1 (wall).
The performance of the algorithm will be evaluated by computing the mean intersection-over-union (IoU). This metric compares the ratio of the number of correctly predicted wall voxels (intersection) to the number of voxels which are either wall or predicted to be wall (union). A value near one indicates better performance than a value near zero.
You are free to approach the problem however you wish, provided that the solution would be implementable and applicable to real-life scans, and your aim is to maximise the IoU on the testing data.
The organisation
The competition will run prior to the workshop, in order to allow you to focus on the presentations during the workshop. There will be a dedicated session to present and discuss the various solutions, and to award prizes.
The competition opens on 2022/08/01 and submissions must be uploaded by 23:59:59 CEST on 2022/09/04. Details of the submission process are in the data challenge repository.
Multiple submissions are allowed and encouraged; only your latest submission will be scored, however every Friday, a ranking of scores will be announced using a small sub-portion of the testing data.
You may work in teams or individually, and otherwise discuss amongst yourselves. Please note, however, that only one copy of each prize can be awarded to an entry, whether it be from a team or individual. Additionally, when making a submission from a team, please nominate someone to submit in their own name.
We have a dedicated Slack channel to discuss the data challenge. Please ask the organisers for an invitation, and then search for the “2022-data-challenge” channel. Post here any questions you have, or ideas you want to share. We will also use this to post important information and updates.
The starter pack
We have provided you with a starter pack to quickly get started with the challenge, and to provide more detailed information. This is available from this GitHub repository https://github.com/GilesStrong/mode_diffprog_22_challenge
This contains links to download the dataset (~600MB), some introduction slides, and a Jupyter notebook detailing how to: read and visualise the data; apply a simple classification method; evaluate the IoU; and produce a set or predictions for submission.