Accessing API Data In Power BI - Part 1
🧩 Accessing API Data In Power BI - Part 1
A step-by-step guide to getting data from web APIs.
🧭 Overview
Power BI is great for analyzing information from data sources such as SQL Server databases where the information is held together in a single, easily accessed, secure source. However, it can also be great for analyzing data information from web based systems which often have more complex security configurations and limitations on how much data can be accessed in a single query.
This series of articles will give step-by-step instructions about how to access web based information, gradually building on more and more complex scenarios until you have a complete picture of how to manage data from web based services.
This guide will assume you have access to
Concepts
The first concept to grasp is, what is an API?
API stands for Application Programming Interface. As the name suggests, they are designed to provide a standard interface to a the underlying data within a service so that other applications can use that data. They are sometimes referred to interchangeably with the term Web Service.
In general, there are two different types of web API:
- SOAP - Simple Object Access Protocol - A strongly defined, powerful, complex interface protocol REST - Representational State Transfer - An architectural style loosely defining how to use existing protocols such as HTTP to manage data
As these descriptions imply, SOAP is complex, rigid and very powerful but is also often a bit painful to implement. REST, on the other hand, is generally much simpler, piggy-backing off the functionality that holds the entire world wide web together. For these examples we'll use REST as it's going to be much more common for our data analytics needs.
The second concept is that to access the service and manipulate the data your request needs to be checked to see if you have permission to access that data. This authentication can be handled in different ways, sometimes managed by the service you are accessing, sometimes assisted by other service.
Very often this process will involve more than one stage, such as:
- The service is configured so that some kind of "account" has access to one or more part of the service
- This account may be represented by a username and password or with some kind of encrypted "token" unique to your account
- When accessing the service you are often asked to log in using your username and password or token
- This login service will check you are a valid account and then will issue a short lived token specific to your current session
- When asking for data you provide this session token, ensuring you have access to only the data you are allowed to see at that moment in time
- Often those short-lived tokens will only give you access to the data for, for example, an hour so you need to keep asking for a new token
- Asking for a new token invalidates the previous one, ensuring tokens can't be shared or compromised
The third concept that is specific to REST is that "resources" are accessed via the HTTPS protocol just like you browser accesses web pages and, like most websites, the address used is associated with some kind of structure, in most cases the underlying database tables. For example:
- A web site called My Drinks may be accessible through https://www.mydrinks.com
- It has a page responsible for ordering drinks at https://www.mydrinks.com/drinksorders.html
- The API web service probably has a similar structure such as https://api.mydrinks.com
- If you want to manage data about drinks orders you would probably access https://api.mydrinks.com/drinksorders
The next concept leading from from this is how we manage specific items of data which is often performed using paths or parameters, for example:
- Getting drinks order number 1 may be as simple as https://api.mydrinks.com/drinksorders/1
- Getting drinks order number 1 for Bob might look like https://api.mydrinks.com/drinksorders?OrderId=1&Customer=Bob
Now onto another REST concept called verbs. Your web browser will usually get data from a service and display it. The verb it uses is called GET. However there are actually a set of verbs to use:
- GET - Gets data
- POST - Sends data to a service (usually to perform some kind of INSERT statement on the database)
- PUT - Sends data to a service (usually to perform some kind of UPDATE statement on the database)
- PATCH - Sends partial data to a service (usually some kind of key to the record and a single new value some kind of UPDATE statement on the database)
- DELETE - You get the hang of it
For now, let's also add one final concept. What happens when we need to send or receive more data than you can put into the address? Here we make use of two other features supported by HTTP:
- Headers
- Body
Headers are like parameters but are generally hidden from the user / browser / query because they deal with technical or security information. They handle the technical stuff which allows the "client" to talk to the "server" using the correct settings.
The body then contains any other data in any format that you have used the headers to agree to. It could be text data, it could be a file contents, a video, an email, or it could be structured data such as XML or JSON.
That will do for now. More details will follow.
Part 1 - Accessing An API That Requires Authentication
Step 1 - Authenticating Using A Power BI Function
Our first step is to authenticate with our web service. This will give us a short lived access token for our session which we can use to run our other queries. But how do we get this token and how do we use it to "log in" to the other API areas? The answer is to use a feature called a Power BI Function, a type of query designed to send results back to other parts of Power BI rather than to visualizations.
In Power BI Desktop select the Home menu and then choose Get data and then Blank query to open the Power Query Editor.
In Power Query Editor select the Home menu and then choose Advanced Editor
Inside the query editor window enter text like this:
let
url = "https://your-api.com/login"
in
url
Swap the address to the the login or authorization address for your API.
You can now name this query (such as SetLoginURL) and run this query and see the results, just like any other Power BI query.
When we are happy this is returning the correct address we are going to change this query into something we can re-use from other queries - a function. Let's edit our query:
let
GetLoginURL = () =>
let
url = "https://your-api.com/login"
in
url
in
GetLoginURL
This time we can see our original query is wrapped in an extra let which tells PowerQuery this is a function, in this case called GetLoginURL. We'd better renameme our query to match. Now we can click an Invoke button to run our function and see what it does.
Notice how the list of queries on the left side of the screen shows your function and its invoked results. The results of a function exist in their own right alongside the definition. You can rename the invocation results too so let's call it GetLoginURLResults.
You now have the fundamental building blocks of this solution at your fingertips. We are now going to create a second function called GetLoginBody in the same way.
let
GetLoginBody = () =>
let
body = "{ ""username"": ""x"", ""password"": ""y"" }"
in
body
in
GetLoginBody
Replace the address with the body specified in your API documentation and don't forget to put in the correct login details.
Now we are going to build a third query function which is going to rely on the results of the previous two functions so go back to your query editor and create another new blank query as follows:
let
GetAPIToken = () =>
let
Source = Json.Document(
Web.Contents(
GetLoginURLResults,
[
Content = Text.ToBinary(GetLoginBodyResults),
Headers = [
#"Content-Type" = "application/json"
]
]
)
),
token = Source[access_token]
in
token
in
GetAPIToken
Make sure we rename our query to match the code so we call it GetAPIToken and then we invoke it. This time we get a different response:
This may seem a bit counter-intuitive but we are going to click the Edit Credentials button and then we are going to choose the Anonymous option.
This works because we are telling Power BI "don't try to send any login credentials of your own because we are handling this for you in our functions.
We now have a set of functions which will easily allow us to manage our login address, our credentials and to get our API token for our other queries.
Step 2 - Getting Data Using Our Token
Now we have our building blocks for handling API tokens we are going to use them to run data queries. Create another new blank query like this:
let
token = GetAPITokenResults,
Source = Json.Document
(
Web.Contents
(
"https://your-api.com/data-endpoint",
[
Headers = [
#"Authorization" = "Bearer " & token,
#"Content-Type" = "application/json"
]
]
)
)
in
Source
All we need to do is swap our address above for our actual API query address. So, what is this doing?
- First it gets the results our GetAPIToken function using the GetAPITokenResults and stores the token in a variable called (imaginatively) token
- Next it tells Power Query that we are handling some kind of JSON response so it uses the Json.Document function to assing the results to the Source output
- Inside that it uses a function called Web.Contents to handle the process of calling our API, passing in our authorization token and also telling the service we'd like JSON as an output
Here's the results.
Step 3 - Manipulate the results
The problem we have here is that our data has been returned as a big block of data. We could simply expand this data in the query, but this could result in a very complex query with multiple responsibilities - get an access token, get the data, format the data etc - so a better way to handle this is to create a new query which references our raw data.
Right mouse click on our Get query and select Reference.
Now our second query is based on whatever our first query does. It gives the same results as the first query, even if we edit our first query. We can now renmame our second query to something meaningful to the users and then apply our formatting rules to make the data presentable.
Questions, Notes and Issues
Step 2 Token Variable Usage
If you were observant you may have noticed one of the queries gets the token from a function and puts it into a variable instead of using it directly.
let
token = GetAPITokenResults,
Source = Json.Document
(
Web.Contents
(
"https://your-api.com/data-endpoint",
[
Headers = [
#"Authorization" = "Bearer " & token,
#"Content-Type" = "application/json"
]
]
)
)
in
Source
Why can't we do this instead?
let
Source = Json.Document
(
Web.Contents
(
"https://your-api.com/data-endpoint",
[
Headers = [
#"Authorization" = "Bearer " & GetAPITokenResults,
#"Content-Type" = "application/json"
]
]
)
)
in
Source
Well the answer is this seems to be considered a security risk. Power BI will give an error saying it violates its firewall rules. It seems to consider direct execution of one web based query to influence another web based query one step too far when it comes to trusting external data. Fair enough.
Applying Query Changes
One issue you may encounter is that applying all of those queries at once results in Powe BI running your API data query before it has executed your GetAPIToken functions for the first time. To make matters worse, the failure will then block the other queries from executing, meaning you can never apply your changes!
There is a simple but hacky workaround for this.
First, don't answer the question to Apply or Discard changes. You can't apply and you don't want to lose them so this is a stupid question. We need to get back into the Power Query Editor without losing anything.
One way to do this is to go to the Power BI Home menu, click Get data and then select Blank Query. This will re-open the query editor without losing any changes (you will need to delete your new Query1 but that's OK).
Now right mouse click on any queries that are causing errors and select deselect Enable load from the menu. You can now apply your changes, allowing you to do things like saving your work. You can gradually re-enable loading query by query, apply and save until everything is working.
Referencing Is All Too Much For Power Query
Sadly sometimes using the reference approach just becomes too much for Power Query and Power BI. It seems it just can't work out what order to run the functions and queries in. You may find it's insistent on running the queries in the wrong order.
If that's the only option is to roll your data get query and your data format queries up into a single, massive query instead of using Reference. Sad, but Power BI just doesn't seem to grasp dependencies at times.
Comments
Post a Comment