Breed4Food - Big Data project
This study was performed within the public-private partnership Breed4Food, where we gathered data of a locomotion experiment with turkey. In this experiment the gait score of turkeys was determined by visual inspection of an expert. Additionally, sensor data, including force plate (placed in the middle of the tom walk) data, were obtained, with the goal to automate the visual scoring process. Data collected during this animal experiment were used as case study to explore new and scalable technologies the store and analyse data with machine learning algorithms. We, therefore, loaded all data of 84 turkeys into a data lake, where we pre-processed the data by an ‘Extract, Transform, and Load (ETL)’- procedure. To test the scalability of a data lake, we simulated increasing volumes of the data to a maximum of 30,000 turkeys and computed the time it takes to convert raw force plate data into comma-separated files and storing these files for further analysis. The result was a reduction of 45 minutes, from 1 hour to 15 minutes, when comparing a single computer to 12 computers for pre-processing data of 30,000 turkeys. The use of Machine Learning (ML) in this datalake was explored by developing a random forest pipeline to classify gait scores based on the force plate andinertial measurement units. This ML pipeline was able to distinguish between two classses, i.e. very bad gait scores versus other scores. The best performing model reached an area under the ROC-curve of 0.871, where a value of 1.0 reflects a 100% accurate model.
These results are accepted for publication (open access) in Animal. The paper by Schokker et al. is entitled: “Storing, combining and analysing turkey experimental data in the Big Data era”, and can be found here. This work was performed within Breed4Food in close collaboration with Hendrix Genetics (Boxmeer).