LexisNexis blog post #7
Here is the final post of my internship period. It has been an amazing 12 weeks and I am greatful for being given the opportunity to have learned and developed so much in a short amount of time.
It took us a while towards the end to get past our issue with processing the raw gps data. Once we got to having a handful of days’ worth of data in the system, we had some issues with our code trying to internally output the data which was not possible due to the size. This led to us having to rewrite the whole code to process the raw gps data, but after rethinking our approach, the new code came out nice after not too much time and ended up being more efficient overall. After our changes we were able to process all of the data and get to the pieces of data we needed to make the charts that we wanted to make visuals for.
Once we got to this point we discovered that one of our other chart data sources was not being processed correctly and had to spend some time fixing this as well. But now we are at the point where we are fully automated from uploading data into the landing zone, all the way to updating our tables in clickhouse to visualize our data in a redash dashboard. We can produce the visuals to show the peak heart rate averages for different time periods for the team, for each athlete, and we can compare between different types of practices to each other or to competitive games. Not only that, but the other charts we made in redash show us visuals for subjective questionnaire data that we collect from the athletes each day that we can use to add another layer to our information and compare between objective and subjective data sources for the same time periods.
A huge thank you again to my mentor Raja who spent a lot of time making sure he could help me in any way that he could. He helped me learn a lot of the intricacies of HPCC that I did not know before that now allow me to do a lot more with my ECL programming. Also a huge thanks to everyone else on the HPCC and LexisNexis teams that helped me during my internship because I definitely had a lot of support along the way.
Our group at NC State look forward to fully implementing and further developing the tools that we have built in HPCC and seeing what impact it can have on our ability to help our athletes train to reach their peak performance.
LexisNexis blog post #6
I have just finished up the 9th week of the internship period now. We have made some great progress towards getting some visuals made directly from HPCC. In the last two weeks we have made some more tweaks to how we were pulling and manipulating the raw gps data. Since the software we get the data from doesn’t allow you to rename the files when you pull raw data we set up the code to take the files with their original names after the upload and spray to the cluster and decide whether it is a men’s or women’s file. The data then gets organized and processed according to the team it belongs to.
From there we now have automation for generating the data sets for the charts of manipulated data and depray of all data files once all processing is done. The charts that are desprayed at this point are raw gps for men and women, and the subjective questionnaire data for the men and women. Once those data sets for the charts are available, we have set up code to pull that data into a clickhouse database. Clickhouse allows us to make a table layout for the data sets so that we can then use the data to create web-based visuals in redash. Redash is where we will be creating dashboards to organize our visuals into reports that we can use and provide to the coaches.
We have a slight snag with our data processing now that we have changed it to organize the raw gps based on team but that should be fixed somewhat easily, and then we will be automated from data upload to visual creation. Excited to see what we can get done in a couple more weeks!
LexisNexis blog post #5
I have just finished up the 7th week of the internship period now. We have really made some good progress with the raw gps data at this point as well as set up a lot of things for the Athlete360 project. We have now automated the process from uploading data into the HPCC landing zone, all the way to outputting csv files that have the processed data with any calculations needed for our early visualizations.
Although, we have still been running into some issues with out spraying and staging of the data once it is in HPCC. It definitely can get quite frustrating when parts of our system start to break when we thought they were ok, but at the same time it is good to not get too comfortable with what we have done. Every time we hit a snag, it gives a chance to make an old piece of code better and make it more efficient going forward. So, even though some days feel like we take a step or two back or get stuck, it can also be viewed as a chance to avoid taking the same steps back at another point in the future.
I am very happy with what we have been able to achieve so far with the raw gps data. We can now view a full session of data and automatically find the peak speeds and heart rates that occurred during each drill in that session based on averages for different time periods. We plan in the next week to improve upon this to be able to display team averages, as well as individual averages across each drill in a session to be able to make comparisons and see how particular drills match up to each other and if they were as hard or easy as intended.
The next few weeks should hold some more decent jumps in progress so looking forward to keeping our momentum rolling.
LexisNexis blog post #4
We are at the end of the fifth week of my internship period. We have done some of the coding for manipulating our raw GPS data. We now can go through the data set and find the average speed and heart rate for different length time periods from a session. For example, if we want to find the three periods of the highest one minute of average heart rate, we can go through the data 600 rows at a time and find the three highest values for all possible periods of one minute from start to finish. This is because we get a data point 10 times per second, so a minute of data is 600 rows. Then we can see which three values are highest and see what time periods from the session they come from. We can look at the elapsed time of the session or the actual time that each data point happened at to find this. The way it finds the minute of data is by making a 600-row window to average but will move forward by a row for each calculation of a minute of data. This is to ensure we find the true one-minute peak averages because they may be happening in combinations of time that are not directly on the minute. For example, a peak average minute may happen from 1:15-2:15 minutes into a drill rather than 1-2 or 2-3 minutes. So, we move forward by one row to make sure we cover all possible combinations of time.
Although we can now do this, there is still a lot of tinkering that has to be done to make sure it works fully as intended and it is giving us meaningful data. Some things that must be done will be excluding data outside of the start and end of the session because the raw data will include everything from when the device is turned on until it is turned off. Generally, this will include time before and after the session so we must use timestamps to identify where the session’s data actually starts and ends. We also will look to possibly have the program ask for input of timestamps to identify where the start and end of the session are, as well as start and end of particular drills during the session. This will help us further filter through the data and provide better comparisons when looking at other sessions.
We also hope to be starting on some visuals soon to present some of our analyzed data. The way we show the data once we have done our manipulation will be the most important part because with the goal of being able to effectively communicate this information back to coaches to help them understand what is going on with their athletes, if we can’t present it in a way that makes it easier to follow and understand then all of our work doesn’t help anything.
Looking forward to getting into the next week of work and seeing what we can do next!
LexisNexis blog post #3
I have just completed my fourth week of the internship period. We are just about at the end of any data processing work we need to do to be able to fully dive into our data manipulation. There were a lot of little things that popped up here and there that caused use to have to stay focused on the data processing stage to make sure we fix any errors and issues that arose, but I believe we are past that for now. These first few weeks have definitely tested my patience just a little bit because the most interesting parts of the project lie ahead but it has been a great learning experience to have to troubleshoot, find and fix the issues that we had and now we can trust that we have an efficient first part of our system.
Now that we will beginning to work on more of the data manipulation, we will have to figure out what will have to be done on the coding end to accomplish all of the different questions we want to ask of the data. Many of these questions have to do with tracking trends over time, comparing between different types of sessions, drills, games, and even lifting to find any interesting bits and pieces of information that may tell us what pieces of data will be most valuable. In the world of sport science and any type of research field, you will collect as much data as possible to make sure you cover all the bases and get as much information as possible. But in the end, you likely will only continue to use a fraction of the data once you know what parts of the data are telling you what you want to know. We are collecting the data from many different sources for the Athlete360 project, so once we can do a deeper analysis of our data, we will be able to know which data will be worth our time and effort to collect once we are full swing in a competitive season, and when we start to branch out the data collection full time with the other sports.
The next couple weeks should help us make some big steps in having more tangible, visible progress so I’m excited to see what keeps coming from our work!
LexisNexis blog post #2
I am now completing my second official week of the internship period. So far, we have do a lot of work to continue to develop the foundation for all of our work for the Athlete360 project as well as my intern project. It has been a great learning experience for me on the computer science side of things to better understand the process of how we break each step down and build each task that needs to be done one piece at a time. I personally do not have great experience in this area because my background is more in exercise physiology and making sense of data that pertains to sport. I have been lucky to learn my computer science skills mostly from Dr. Vincent Freeh at NC State University. He has helped me tremendously when it comes to gaining a better understanding of python, SQL databasing, and general command prompt/PowerShell functions.
Before starting the Athlete360 project, I had been working on building our own system to bring together all of our data sources into a database and begin to create the analysis and reports for the data from there. It was getting the job done but was by no means a smooth or efficient system.
Now that we had the Athlete360 project, it has helped a lot with not only providing us with the software to bring everything into once place and act as our database as we process and begin to analyze the data, but also providing me with a supporting cast that can help me continue to build my computer science knowledge. Raja has been immensely helpful in guiding me step by step for what task needs to be accomplished next and how we can build our system piece by piece so that when we get to the point when we start to begin working on the manipulation of the data, the process of getting the data ready for this step is efficient and as automated as possible.
Although, since the beginning of this project, the GPS company that we use for collecting data with the soccer players has release a new update for their software. In this update they have added a feature for tracing a player’s path during a recorded session and giving insight to what movements they performed on the field and when. This was planned to be a big feature of the intern project but now will have significantly less impact since it is now provided in the software. A new big goal of this project will now be to identify certain time periods during a session of collected data and find where the highest peaks and averages of particular variables are happening. For example, if we have data from a practice, and we find that the peak one minute values for speed and heart rate happen during a particular drill, and the peak five minute values for the same variables happen during a different drill we can begin to relate this information to other practices and to games. How do these values compare during the most intense periods of a game? Were these particular drills designed in a manner that should produce the values found, and should it be more or less intense when compared to a game? Doing this kind of analysis with the GPS data can provide us with very insightful information and is something that you cannot do to the same extent in the GPS software provided.
So, the main end goal of this intern project has been tweaked, but in the end should provide us with tools that can help us greatly when trying to gain a better understanding of our data and what is happening with the players. This in return should help us better communicate with coaches after practices and games to help them understand how the athletes are performing, if the goals around how they are designing the practices being hit, and how we can better prepare the athlete for each game.
LexisNexis blog post #1
My name is Chris Connelly and I am a new Intern for the summer of 2019 with LexisNexis. My background is in Exercise and Sport Science. I completed my Masters in Exercise Science and Nutrition back in 2015 and currently work at North Carolina State University with the department of strength and conditioning for olympic sports. I did my graduate work at Sacred Heart University in Fairfield, CT. During this time, I worked in a lab doing 3D motion analysis for running gait and other sports along with electromyography data collection. When I originally moved to North Carolina, I worked with the North Carolina Courage professional women’s soccer team who play in the NWSL. During my time with them I did daily GPS data collection as well as processing, analysis and reporting of the data back to the coaches.
I currently work as a data/sport scientist and handle data on a daily basis for athletes at NC State. I have also been working with LexisNexis since the start of this year on a project using HPCC systems to provide analysis for our data at NC State. This project is called Athlete360 and is focused on bringing together all of the data sources we are collecting from with the men’s and women’s soccer teams into HPCC. From there the long-term goal is to use the machine learning library to create a predictive analysis for our data. The idea being that we will try to use the data to give us an idea of how ready an athlete will be to perform in the coming days before a game or competition.
For my intern project, we will be building off of the foundation of the Athlete360 project, but once past the initial stages, we will be focusing more on the specific analysis of the raw GPS data we are collecting with the soccer teams. The specific goals with the data will be to provide a deeper dive into the variables than we could perform using the software provided by the GPS company. This will provide us with a greater ability to use the data to inform us and the coaches about what is happening in a practice during particular drills, during games, and how different players will be comparing to each other or with themselves over time.
I will be working with Raja Sundarrajan as my mentor during this project and I am very excited to see what we are able to accomplish over the course of this internship period!