Async tasks are an intermediate backend engineering problem. Most public facing
apis need to return an answer under 1 second for a good user experience. However
many real world tasks require more than 1 second to complete or need to be scheduled
to repeat. The solution is usually to create a task that can run asynchronously.
In Django one way to accomplish this is by writing a management command.
Real APIs are typically paginated. Pagination means that only a portion of the matching
data is sent back at a time. Sending all of the results without pagination could
overwhelm the requesting program and bog down the service returning the
data. Since most real-world apis are paginated, learning how to traverse and consume
pagination is a good thing to master.
The following is an example of a django management command that queries a blog api,
traverses pagination to get all of the posts, processes and stores the data
in the database. Be sure to read the notes at the end of the post.
And below is the logging configuration that sends the management command’s logs to
stdout in development.
Things to note:
Our imports are separated by type - stdlib, nonstandard libs, django classes and project-specific classes.
We show how to specify optional arguments so we can change the behavior at runtime if needed.
We name the logger app.management.commands. If you use __name__, as mentioned in the django docs, you will have to make a logger for each management command and that seems silly. Instead we use the same logger for all management commands.
We use a requests Session and process each page-worth of results using generators. This keeps the ram required to just the amount needed to handle a page-worth. A solution without generators would consume more memory because it would keep more items around until all the pages are processed.
We wrap our data storing calls (model.save()) in a try except. In the except clause we log our error and continue. If we allow errors to be raised the job would stop, even if only one item has the problem. You should have your logging system send you an alert if the job logs a warning.
Protip: you can pipe json responses to jq
to quickly understand the data. For example running $ curl -s 'https://blog.com/v3/data/search/?q=science&token=abc123' | jq '.data[0] | keys' will query an api and send the json response to jq. The jq query here pulls out the first result in the data array and lists the keys on the object. I did several sanity checks in the commandline using this combo while developing this django task. Read the jq site for more things that are possible.
If you need help solving your business problems with
software read how to hire me.