by Ryan Tannenbaum, founder of for.education
ManageBac is a powerful LMS, but if we only consider the platform itself, we won’t be able to take full advantage of what it offers.
The ManageBac API offers keen schools a powerful way to “deep dive” into their data and get greater insights into their programs, students and curriculum.
This article offers a high-level overview of how to begin conceptualizing data - so that whether you have the technical knowledge to extract it or not, you can begin to understand the process of making your data ready for visualizations, dashboards and analytics. The goal is to provide an overview of steps, and the thinking behind them, that I employ when working with schools that are looking for analytics solutions.
This article focuses on attendance analytics as it is a relatively simple data structure, one that has been further synthesized for the sake of brevity - the ideas here can then be extended outward to address other parts of the ManageBac system.
Attendance Insights
ABC International School utilizes ManageBac for class and homeroom attendance. The system is easy to use, but they know that the school is not using it as effectively as possible.
Management would like to see the following:
- Attendance trends over time by year level, grade and student
- Can attendance be a predictor of student withdrawals?
- Tracking teacher diligence when taking attendance
- Who consistently has been missing attendance?
- How can we follow up with them quickly, while still keeping track, long term, of attendance taking?
- Quickly identifying discrepancies between class and homeroom attendance
- Busy teachers frequently mark students as all present - so inconsistencies between class and homeroom attendance on the same day can point to issues with the attendance taking process in the school
- Attendance that hasn’t been taken on ManageBac is automatically marked as present the next day - therefore, we need to find other ways to detect discrepancies
Building our Schema
When we are working with large amounts of data, it is important for us to first consider how the different pieces relate to each other. Poorly structured data will quickly convolute and impede analytics.
There are multiple schemata that data scientists use to organize information. The simplest structure is known as star schema - in the center, we have our fact table, and around it, we have dimensions.
Our fact table is our gateway and the center of our queries. The dimensions give us more detail, but help us to avoid duplicating the same data over and over again.
When building our fact table, we want to outline the key features of our data and keep the columns as simple as possible. We can add additional information in our dimensions.
For attendance, we could consider the following:
Student |
Student ID |
Type |
This can be a string that tells us whether this is “homeroom” or “class” attendance |
Location |
Homeroom/class ID |
Date |
|
Attendance Status |
|
Comments |
These six columns are actually all we need to explore attendance information.
Keen observers might think this is a bit lacking. After all, we are not storing the term information, the teachers, the program, and a whole lot of other details.
Actually, we can still get access to this information, but there is no need to put it in the fact table, we can get all this information from the dimensions.
We can store this information in the class/homeroom dimensions
Teacher/ Program |
We can store this information in the class/homeroom dimensions |
Term/Year |
By knowing the date and class, we can then easily find the term |
With this simple structure, we can now easily begin to analyze and make connections between our data.
Let’s consider our three requirements, and how we may structure queries to address them:
- Attendance trends overtime by year level, grade and student
- We can count attendance statuses, while filtering by year level (in year group or class), student name, grade, subject, etc.
- Tracking teacher diligence when taking attendance
- The ManageBac API will return an empty object if the teacher has not yet taken attendance - we could either store the attendance status as blank in this case, or we could give it a more human readable name like “Attendance not yet stored” during the ingestion process.
- We can then query our fact table by this status, and then get our teacher data from the year group/class dimensions
- Identifying discrepancies between class and homeroom attendance for a student
- For this, we will need three separate queries.
- First, one that pulls the homeroom attendance data, and another that pulls the class attendance data
- Our third query will join those two queries on date and student ID
- Finally, we can look for differences in attendance status - we can also look to see if there are any comments that give us better insight into the discrepancies
Visualizing It
Google Data Studio:
Google Data Studio is a free solution that is ideal for schools already on the Google Data Studio platform.
With Data Studio, we can import our data using a SQL query, and then represent it in the dashboard.
For this visualization - we are going to import our data twice, but with a different structure.
First we will import all our attendance data into a very big list. In data studio, we will get the following fields:
We will use the following SQL queries to do so. They are provided for reference and interest.
/*First we will define the fields we want to take*/ select type, date, attendance_status, comment, c.name as class_name, c.program as class_program, c.grade as class_grade, c.teacher as class_teacher, subject, y.grade as yg_grade, y.program as yg_program, y.teacher as yg_teacher, s.name as student_name, s.email as student_email, s.enrollment_status from facts as f /* We make two joins, one with the class table, and one with the year groups table - we can use our type field to distinguish between our different facts */ left join classes as c on c.id = f.location and f.type='class' left join yeargroups as y on y.id = f.location and f.type='yeargroup' /*Finally we join with the student table to get student information*/ join students as s on s.id = f.student_id;
Secondly, we will write another SQL query to import just the discrepancies between class and homeroom (year group) attendance data. Note that we have ignored the “attendance not taken” and “late” entries as these are probably not reflective of a process error within the school - namely that teachers had incorrectly taken attendance during class.
/* First we will define the fields we want to take */ select a.name as class_name, a.grade as class_grade, a.program as class_program, a.teacher as class_teacher, a.attendance_status as class_attendance_status, a.date as date, y.teacher as yeargroup_teacher, f.attendance_status as yeargroup_attendance_status, s.name as student_name from facts as f /* We will join our fact table with all the year group details */ inner join yeargroups as y on y.id = f.location and f.type='yeargroup' /* We will do a subquery like above, except we do it with the class attendance data */ inner join ( select * from classes as c inner join facts as f on c.id=f.location where f.type='class') /*We join our first query to our subquery where students and dates are the same - note that we have to convert the date to remove the timestamp - we only want the dd-mm-yyyy representation of the date*/ as a on a.student_id=f.student_id and to_char(a.date :: DATE, 'dd-mm-yyyy')= to_char(f.date :: DATE, 'dd-mm-yyyy') /* Get the student information */ join students as s on s.id = f.student_id /* Filter to only mismatching attendance statuses */ where f.type='yeargroup' and a.attendance_status != f.attendance_status /* Take out teachers who did not take attendance, and lates as these are not process error issues */ and a.attendance_status != 'attendance not taken' and a.attendance_status != 'late' and f.attendance_status != 'attendance not taken' and f.attendance_status != 'late';
Once we have imported the data, we can build the visualizations.
This gives us a detailed overview of the general attendance statistics - we can easily filter by different categories, including subject, grade or year group by clicking on the appropriate columns.
This shows us which teachers have not taken the attendance - note we can filter down by date to be more specific. If we click on a specific teacher, we can get the exact students and dates that were missed.
If I were building this for a school, I would separate the teachers into their own dimension - doing so would allow me to include including class and homeroom data on the same chart and contrast them by teacher.
I would also try to include links that go directly to the ManageBac attendance pages.
This page shows us discrepancies between class and homeroom attendance - mostly looking for teachers who are not accurately completing the attendance.
You can visit the dashboards here, note that the data is procedurally generated, and so it is not going to be entirely realistic, but here are some things to consider:
- Can you see any correlation between attendance and enrollment status?
- It seems like if we filter by withdrawn students, they have much greater levels of absences
- What information would be useful if you were to use these dashboards? Would it be better to store teacher emails as well as names?
- If we could export spreadsheets of teacher emails, we could quickly mail merge and email notifications
- What other insights could you start to draw from this data?
- Do you think these types of dashboards would be valuable to staff at your school?
Pushing the Envelope
I’ve worked hard to keep this introduction as simple as possible, but were I actually developing this to be used in a school, I would add more dimensions to my original schema.
An easy way to identify new and potential dimensions to add is to look at your existing dimensions and see if there is anything that repeats between them (in our case, both classes and year groups have teacher, grade and program fields, therefore, we might create two or even three new dimensions, one for teachers and one for grade/program, then add these columns to our fact table).
Finally, I would consider how to make this data more immediately actionable. In order to do so, some things I would also consider adding are:
- Teacher emails so I can easily generate a list of contacts or follow-ups
- Creating custom fields that generate links to the class and year group attendance pages, so I can quickly jump into ManageBac and see the issue first hand
- Special permissions so divisions, administrators and heads as well as teachers can get tailored dashboards that only show the data relevant to them