Intro – Alexa, what is Alexa?
Alexa is the (relatively) new voice-controlled AI assistant developed by Amazon. It (she?) is the voice service that powers the Echo and Echo Dot devices. You can also roll your own Alexa device using a Raspberry Pi.
The Echo and Echo Dot are Amazon’s 360 degree bluetooth speakers that use far-field voice recognition.
There are other voice-controlled assistants out there. Google Home is the Google equivalent to the Echo devices but it’s not yet available outside the US and Microsoft have just announced a Cortana-controlled Home Hub app. But Amazon’s Alexa seems to be leading the way in terms of integrations and access for third-party developers so far.
The rollout has been gradual but there’s already plans to expand the skills and the voice command functionality quite a lot for Alexa devices and hopefully some more of the integrations will be made available outside the US soon too.
What can it do?
After a week or two with the Echo Dot, there are a few different use cases for libraries that have come to mind.
Though clearly developed as a way to streamline online shopping and play music, it’s interesting to explore the different ways this kind of voice-based navigation might work for information services.
Alexa and other AI assistants that use a conversational interface have great potential for accessibility and to change how users interact with the library services.
Echo devices use some pretty advanced natural language processing (NLP) so you can use quite casual conversational language to interact with it (and rumour has it the Google Home devices have the Echo beat in this area). Sometimes my accent causes some confusion though so it’s not all smooth sailing.
As well as answering questions, Alexa can read audiobooks (though limited to Audible) and news updates that can be customised from a growing list of providers including various newspapers and BBC channels. You can also integrate Alexa with a calendar which is useful for scheduling, listing upcoming events and for room bookings.
There are also various integrations with smart home devices, such as Smart Plugs and Nest Thermostats.
The wake word can’t be fully customised at the moment, so you’re stuck conversing with a device called either Alexa, Echo or Amazon.
Asking, Alexa, what’s going on? will give you the time, the weather and your calendar appointments for the day.
You can request information from Wikipedia and and get definitions for a word. There’s also conversions, equations, weather updates, and other fact-based services.
The third party integrations (the Alexa equivalent of apps) are known as ‘skills’ and include a bunch of different information sources, smart home integrations and utilities with more being added every day.
In terms of third party skills, the UK list is still quite sparse. And also, annoyingly, there’s currently no IFTTT integration available in the UK though apparently this will be available soon (Update: As of mid-December 2016, Alexa now supports IFTTT commands in the UK. Woot).
The skills available vary from quiz and fact apps like the UK Driving Theory Quiz to local train and bus info and my current favourite, Short Bedtime Story which provides a personalised bed time story (presumably for younger users but still fun to play with).
There are also Easter eggs galore and a respectable amount of dad jokes. Alexa knows both the meaning of life and the Three Laws of Robotics. Which is reassuring. Kind of.
Another potential use case for Alexa devices in the library is to provide news or featured resource updates relevant to the space or library service.
For example, it’s possible to develop custom updates based on RSS feeds. This uses the Flash Briefings format. Flash briefings are a daily news alert feature- you can hear the latest updates or headlines from major news sources. This has interesting potential as a daily update about particular library resources, such as new books or journal articles in a given field.
If you have multiple devices, Alexa uses ESP (Echo Spatial Perception) to detect and respond from the nearest device which could mean providing subject-specific updates from different parts of the library.
The Flash Briefing is perhaps the easiest way to create a skill as it doesn’t require custom coding but it’s also the least flexible.
But there’s no real flexibility or interaction with the user in this kind of skill beyond the list of items and navigating back and forth through the list. On the other hand, with a fact-based skill, you can configure different ‘Intents’ and ‘Utterances’.
Intents are ways to refer to actions that fulfil the user’s spoken request. These can have arguments (which are called Slots) to add parameters or criteria to the requests.
Utterances are the words that people use to make requests. An utterance list for a skill needs to be flexible enough to capture different ways people might ask the same question – this is where reference interview training comes in handy.
So, to keep with the featured or new resource list example, users could ask for today’s resource, or for a recommended resource, or about a particular subject.
Developing a custom skill
Amazon has released the Alexa Skill Kit to make it easier for people to develop their own skills for Alexa-powered devices, though you do need a bit of development know-how for this. A lot of the base examples use Node.js but there are examples in other languages like Python around too.
There are a few different structures of skills that you can use as a starting point and the documentation around these is pretty thorough and up-to-date.
The process to create a custom skill involves designing the voice user interface, building and hosting the code and then submitting the skill for approval. Luckily, the Amazon Developer documentation for this is pretty good.
To create your own skill, you can use the helpful Alexa Skills Kit. If you’re interested in developing a trivia/quiz type skill, there’s a thorough walk through available. A fact skill can be used to list facts on a particular topic but also can be configured to have a book or library resource of the day, for example, as a more conversational version of the above example.
Most examples and documentation for Alexa involve setting up the skills using AWS Lambda, an Amazon Web Services. Alternatively, you can host skills on your own server. However, Amazon’s Alexa server will only work using HTTPS. These means that it is possible to run an Alexa skill on a private network but there’s some additional complications in setting this up.
Privacy
What about the obvious privacy concerns that a listening device presents? Various reports have confirmed that Alexa can detect but not recognise all words when waiting for the wake word. If it detects the Wake Word (such as Alexa or Echo), only then does it begin streaming to the cloud. This is similar to other devices like Smart TVs and the voice navigation already available on phones. Wired have a really good article about this and how both Amazon Alexa and Google Now devices manage data that’s definitely worth checking out.
You can also manually delete recorded interactions from the Alexa app though there’s no bulk way of doing this.
But obviously there are still concerns with introducing an ‘always listening’ device into your library and so that’s something to take into consideration (and at the very least, should always be clearly signposted).
What’s the verdict
There are still plenty of limitations to this kind of consumer-focused device but voice activation and integration with smart devices and other services is an interesting development with lots of potential for improving how users can interact with collections and other library and museum services.
Amazon’s Alexa is just one of the services now making the most of machine learning for better voice controlled interfaces so we’ll be looking at the other devices as they become available. If you’re not quite ready for your own Echo device, you can try Alexa out on the Echo simulator site at http://echosim.io/