Professional Experience Project

Developing an e-Resource Usage Dashboard for Texas Libraries

Kate Reagor, University of Texas at Austin - School of Information


Introduction

Data collection is extremely important to all government funded entities, and libraries are no exception. To justify their continued funding, library staff must collect and maintain data on material circulations, program attendance, computer use, and more. But these days libraries are increasingly expected to provide electronic content as well as physical: reference and journal databases, resources for career development, lanaguage learning, genealogy, early childhood literacy, and legal assistance. The use of these resources also needs to be reported, and yet the expertise required to collect and understand data showing how online resources are being used is often beyond the staff capabilities of smaller libraries - public and academic.

Though Texas libraries might license electronic resources on their own, the majority get their core collection through membership in the TexShare Databases Program, a program of the Texas State Library and Archives Commission. TexShare provides 600+ Texas libraries with access to 68 electronic resources from seven different providers. But gathering usage reports for every TexShare resource involves visiting a dozen different administrative platforms, many of which use different metrics and standards for reporting use. And while TexShare program staff do collect usage data for all libraries and resources in the program, the amount of time required to process and standardize the data to the point where it would be useful to an individual library has long been a barrier to providing this data directly to their members.

Project Goals

To assist TexShare staff in aiding Texas libraries who are unable to collect their own e-Resource usage data, I undertook the following tasks:

  • Design a series of data workflows to speed the process of cleaning and standardizing TexShare e-Resource usage reports
  • Create a relational database for storing the cleaned usage data
  • Survey TexShare member libraries to assess their needs around the acquisition and display of e-Resource usage data
  • Use the survey feedback to develop an interactive dashboard where libraries can view and download their usage data

Disclosure

While I undertook my Professional Experience Project in partnership with the Texas State Library and Archives Commission, I am also employed by that agency and work with the TexShare programs. All tasks performed for this project were outside the scope of my normal work duties.


The Planning Stage

Project Charter

After my initial project proposal was approved, I worked with my field supervisor to develop a formal project charter. The charter details key aspects of the project, including:

  • Project Team
  • Stakeholders
  • Project Scope Statement
  • Risks, Constraints, and Dependencies
  • Communication Strategies

Included with the charter was a detailed timeline showing all tasks and deliverables, task dependencies, and when I expected each to be completed. These tasks were broken out into two distinct phases: the Data Processing phase (database and workflow creation) and the Data Visualization phase (conducting a needs assessment survey and creating an online dashboard).

See the full project charter and timeline

Project Tools & Applications

As a state agency, staff at the Texas State Library and Archives Commission are often restricted in the software available to them. I also had to consider existing staff capabilities around coding and programming languages when choosing how to manage the data cleanup workflows. The workflows needed to be in a format that could conceivably be understood and adopted by current or incoming staff members. I communicated with my field supervisor and agency IT staff and selected the following applications:

  • Database Program: Microsoft Access
  • Data Workflows: OpenRefine
  • Member Survey: SurveyGizmo
  • Data Visualization: Tableau

Data Cleanup and Workflows

Selecting and Assessing the Data

Out of the dozen-plus vendor-supplied usage data reports currently collected by TexShare staff, six were selected for initial inclusion in this project. Because e-Resource usage reports are in the process of adopting a new reporting standard, COUNTER 5, priority for inclusion was given to the four reports already provided in COUNTER 5 format: Credo Reference, all EBSCO resources, three Gale Cengage resources (Gale In Context: Science, Gale in Context: Opposing Viewpoints, and Gale OneFile: Health and Medicine), and the ProQuest SciTech Collection. Two additional reports in non-standard formats were also included owing to their popularity with the project's target libraries: HeritageQuest Online, and LearningExpress Library.

Before the data from these reports could be processed, a full assessment had to be performed to determine what format each report came in, what transformations would be needed to ready the data for import into the database, and how non-COUNTER 5 data could be converted into an approximation of the COUNTER 5 metrics. This information was later formalized into a report describing all data included in the TexShare Usage Reporting Dashboard, and what transformations were made to standardize the reporting formats and metrics.

TexShare Usage Report Documentation

Developing the Data Workflows

The data workflows for cleaning and standardizing the reports were created in OpenRefine. Using OpenRefine, I created a series of transformations for each vendor's monthly usage report that would normalize their layouts and ready them for import into the database. Once complete, these transformations could be saved as a JSON file, and then applied to all subsequent monthly reports from the same vendor.

a JSON script describing a series of transformations to a data set

Next I needed to standardize the library names and identifiers, which can vary across reports. The TexShare Databases program already uses a set of unique ID numbers to identify each member library and their branch locations. Each e-resource vendor also uses their own unique IDs to designate individual accounts. Using these sets of IDs, I created lookup tables for each report, allowing me to match the TexShare ID number for each library to their own usage numbers within the report. Once imported into the relational database, these IDs would be used to join that data to the table containing the member libraries' names and information. Additionally, because libraries sometimes have multiple vendor accounts spread across their branch locations, I set up the lookup tables to identify branch accounts and make sure their usage numbers were included in the institution's total usage.

a spreadsheet connecting library vendor IDs to the TexShare ID

The data workflows could now connect to those lookup tables and automatically match the vendor IDs with the TexShare IDs. To make sure no institutions were missed, I created additional workflows for each report to check and identify any vendor IDs for which there was no match in the lookup tables. Any unmatched IDs could then be identified, and those institutions added to the lookup table.

Some usage reports were already standardized and required relatively little work to get them into the proper configuration. Others presented a larger challenge, including one report that arrives as four different spreadsheets, each with different data types and column headers. But I was ultimately able to develop a single workflow for each of the six reports, transforming files that looked like this:

two spreadsheets displaying different usage metrics for the same resource

...into a format ready for import into the database.

a single spreadsheet showing clean and normalized usage data

Final Workflow Test

After processing all available reports going back to September 2019 (the start of the current state fiscal year), I set the data workflows aside to work on other areas of the project. But when the new reports for June 2020 started to arrive in early July, I ran a test to see how long processing and importing a single month's worth of usage data using the new workflows would take.

I processed six usage reports across nine files, collectively containing upwards of 30,000 rows of data, ran ID checks on each report, updated two of the vendor ID lookup files with new library information, and uploaded the completed files into the database. Work that would previously have taken hours was completed within the span of 30 minutes.


Designing the Database

Because the data workflows and ID lookup tables are handled externally in OpenRefine, the database design ended up being relatively simple. It is composed of two tables: one storing information on the TexShare member libraries, and one for the e-Resource usage data. The tables are joined using the TexShare program's unique organization IDs.

a database ER Diagram with a table called Organization and another called Usage

The agency's IT does not currently support use of MySQL database servers by non-technical staff, so the database was created using Microsoft Access. The library data was pulled from the program's customer management software and imported into the Organization table, and the cleaned and standardized usage data was imported into the Usage table. The data was now ready to be connected to a visualization application and made available to TexShare member libraries.


Needs Assessment Survey

Before tackling the data visualization of library usage data, I wanted to gather feedback from Texas libraries about what they would want from such a display. To this end, I developed a survey asking libraries about their needs around usage data collection and display for their electronic resources: specifically, what their priorities would be if TexShare staff were to compile and supply usage data directly to libraries.

After an initial run by a smaller focus group of library staff, the survey launched on July 13, 2020 and stayed open through July 25. Staff from 396 libraries were invited to participate. Because collecting usage data is a particular challenge for libraries with limited staff and/or technical expertise, target libraries were limited to:

  • Small Public Libraries (serving a population under 25,000)
  • Medium Public Libraries (serving a population between 25,000 and 100,00)
  • Two- and Four-Year Academic institutions with fewer than 5,000 undergraduate FTEs (Full-Time Equivalent)

Surveyed libraries were further limited to those who showed some patron use in at least two of the four primary vendor reports included in the project during State Fiscal Year 2019.

Of the 396 library staff invited to participate, 154 completed the survey. I took the resulting survey data and created a report displaying the results in Tableau.

two pages showing graphs displaying survey result data

View the full TexShare Usage Data Needs Assessment Survey Report

Usage Dashboard

While waiting for the survey data to come in, I began work on visualizing the e-resource usage data stored in the database. Working in Tableau, I started out focusing on creating a customizable report formatted as a printable PDF. However, as the incoming survey responses indicated a great deal more interest in being able to view and download usage data through an online dashboard, I switched my focus.

I designed the dashboard in Tableau Desktop and made it available online through Tableau Public. The main landing page allows any user to select their library and indicate the date range for which they would like to see their usage data displayed.

a dashboard displaying library usage data

At a glance, the dashboard displays information on how the library's patrons have been using their e-resources. Navigation buttons at the bottom take them to different report views, where they can download their usage data in the format of their choice. The only planned element that didn't make it into the dashboard was the basic printable PDF Usage Report. I determined that a completed report would need to contain information from outside the scope of this project (such as program fee data and cost savings) and it would therefore need to wait for future development.

View the TexShare Databases Usage Data Dashboard

Suggestions for Future Development

If the TexShare program chooses to officially adopt the Usage Dashboard as a method for providing usage data to its members, there will certainly be room for ongoing updates and development. My recommendations for project updates going forward include:

  • Conduct a round of user testing for the prototype dashboard, possibly starting with the same libraries who responded to the needs assessment survey. Gather feedback on the functionality of the dashboard and use that to refine it before opening it up for broader use.
  • Develop workflows to process and add additional TexShare e-resource usage reports to the Dashboard. Decisions on priority order for inclusion should consider the total use for the resources, the difficulty of converting the existing metrics to the COUNTER 5 format, and which resources are most desired by TexShare member libraries (this information being obtained through additional surveys).
  • Process and add usage data from previous fiscal years. This data is already available as a clean dataset but would need to be converted to the COUNTER 5 format. Inclusion of past usage data emerged as a priority in the needs assessment survey, and many library staff expressed interest in seeing comparisons of their current year's data to that of previous years.
  • Complete the basic printable PDF usage report, possibly including additional program data such as member fees and cost savings comparisons. Though more surveyed library staff indicated that an online dashboard would be their preferred method for viewing usage data, the majority of those from small public libraries expressed an interest in a pre-made printable report that they could share with outside stakeholders. An additional version of the report with more customization options could then be developed after.
  • Other requests from libraries included making usage data available in its original non-COUNTER format. This might be of particular interest for resources that report very specific types of use, such as Chilton Library's list of car manuals accessed by make and model, and LearningExpress Library's count of tests, tutorials, and computer courses taken by library patrons. Some also expressed interest in seeing data from the Job & Career Accelerator center from LearningExpress Library. Though this usage is reported to TexShare staff, it is also included in the total numbers for LearningExpress. Inclusion of this data would require some consideration, so as not to artificially inflate the library's total usage numbers.

I do hope to see this usage project implemented and continued as part of the TexShare program's support to its members. If I had to present a single takeaway from the member feedback in the needs assessment survey, it would be that there is a genuine need from Texas libraries for greater assistance in both collecting and understanding their e-resource usage information. If the Texas State Library were able to find ways to take a more active role in helping libraries manage their data, it could prove invaluable to them.