Becoming a Library Data Wrangler to Reckon With

There are a lot of good, strong reasons for librarians to work with data. In fact, we already do work with data, both big and small on a daily basis. But you may have also noticed even more data-related tasks creeping into your workday and maybe even a subtle shift from information to data management in some parts of your job.

While we may have access to ever-growing amounts of the stuff, it’s not always so easy to know how to find the meaning or the best way to tell a story with the data we have.

Library services need to be user-driven and data can help us do a better job of this. We’re also often required to demonstrate the value of our services: which involves both quantitatively and qualitatively diving into the depths of our user and resource data.

Why Learn Data Skills?

There’s a case for data skills no matter what specialism or area of information science you’re in.

Library Journal included Data Analysis as one of their ‘Top Skills for Tomorrow’s Librarians’ for 2016.

The LSE Social Sciences Impact blog also made a strong case that ‘Public libraries play a central role in providing access to data and ensuring the freedom of digital knowledge.’

Data Scientist Training for Librarians (DST4L) argue that, as research levels intensify, there are clear benefits to library staff also learning how processes and services could be better streamlined and simplified. Walking a mile in their shoes, so to speak.

And, last year, UK Lib Chat held an interesting twitter discussion on data in the library, including how to best present library usage stats to stakeholders and other real-world examples.

Despite this, data literacy is not yet a core part of most library science courses or staff training provision.

So, for those who are interested in learning more or getting more from their data, we’ve put together a list of some online resources for sharpening your data science skills that have really struck a chord with us.

If we’ve missed any that you really rate, let us know in the comments.

Data Science for Beginners

To start at the very beginning, we have Data Science for Beginners, a series of five short videos presented by Brandon Rohrer, senior data scientist at Microsoft. The videos are part of Microsoft Azure Machine Learning which we didn’t expect to be particularly beginner-friendly (bias alert!) so were pleasantly surprised.

The five questions data science answers are brilliant and explained really clearly with examples and metaphors galore.

School of Data

The School of Data are a global network of data literacy practitioners, both organisations and individuals including several chapters of Open Knowledge.

They have a series of free online courses (Data Fundamentals, A Gentle Introduction to Data Cleaning, Introduction into Exploring Data, A gentle Introduction into Extracting Data, A Gentle Introduction to Mapping and more) but don’t be afraid to delve into their Data Expeditions.

The expeditions are a great way to learn how to tell a good (and accurate) story with your data in a hands-on way using real data. We had a lot of fun with the Data Explorer missions when they were available via the Peer 2 Peer University (P2PU) so can highly recommend these for a comprehensive introduction to data wrangling.

The Data Scientist’s Toolbox

For those who prefer a more MOOC approach, this is an introductory course available created by Johns Hopkins University that’s available on the Coursera platform. It combines the conceptual parts of data science with more practical skills (version control, markdown, git, GitHub, R, and RStudio). It’s available for £22 but it’s also free to audit (meaning it’s free to complete the course without receiving a certificate or having access to graded items). If you’re quick, you might still be able to join the latest cohort that started on the 1st of August but otherwise, the course is run regularly and you can sign up to be notified.

Open Source Data Science Masters

If you’re ready to go next level, check out The Open Source Data Science Masters. It’s more of a crowd-sourced curriculum, signposting MOOC and other materials that have been highly rated by those who’ve used them. It goes into a lot of detail so even if you don’t want to tackle the entire curriculum right now, there are some great, target resources listed for cherrypicking.


If you feel you’ve found your calling and are ready to pursue the life of a data librarian, then we’d be pretty remiss to not mention Databrarians who collate a lot of information for ‘aspiring data librarians’. They also have a great list of ‘tips and tools’ to help you get started and an impressive list of Data Librarian Webinars.

Or Take a Project-Based Approach

These five resources are great if you’re ready to hunker down and study data science in a structured kind of way but if you’re most interested in pursuing a particular type of data project, there are more specialist resources available.

If you’re interested in visualisation, check out this Getting started with visualization after getting started with visualization guide on Flowing Data.

If you’re more on the digital humanities side of things, here are some exercises drawn from the CHASE Arts and Humanities workshop on Information Visualisation that you can tackle yourself.

If you’ve got archive or other collections data you want to work with, the Programming Historian lessons are a great way to go.

While finally, if you’re feeling ready to start applying this knowledge to library data, Owen Stephens has got you covered.

There are lots of different aspects and specialisms within the role of data scientist so we’ve kept the focus on general training that is useful for both those looking at improving their evidence-based decision making skills and those who are just curious and looking to dip their toes into the water.

Once you’ve got the foundations down, it’s always good to have a project to apply your new skills to. We recommend starting with a small project that will help you or your colleagues at work, whether it’s on collections and how they’re used or to learn more about the broader community you work in.

Even if you’re not quite ready to dig into your data, you can still make it open and available for others. (*Thanks to @librarieshacked for the reminder about this).

And if you’re looking for some extra inspiration, check out some of the beautiful and meaningful examples on Information is Beautiful.