rtoot

Collecting and Analyzing Mastodon Data

David Schoch

2022-11-09

Interact with the mastodon API from R.
Get started by reading vignette("rtoot").

Installation

To get the current released version from CRAN:

install.packages("rtoot")

You can install the development version of rtoot from GitHub:

devtools::install_github("schochastics/rtoot")

Authenticate

First you should set up your own credentials (see also vignette("auth"))

auth_setup()

The mastodon API allows different access levels. Setting up a token with your own account grants you the most access.

Instances

In contrast to twitter, mastodon is not a single instance, but a federation of different servers. You sign up at a specific server (say “mastodon.social”) but can still communicate with others from other servers (say “fosstodon.org”). The existence of different instances makes API calls more complex. For example, some calls can only be made within your own instance (e.g get_timeline_home()), others can access all instances but you need to specify the instance as a parameter (e.g. get_timeline_public()).

A list of active instances can be obtained with get_fedi_instances(). The results are sorted by number of users.

General information about an instance can be obtained with get_instance_general()

get_instance_general(instance = "mastodon.social")

get_instance_activity() shows the activity for the last three months and get_instance_trends() the trending hashtags of the week.

get_instance_activity(instance = "mastodon.social")
get_instance_trends(instance = "mastodon.social")

Get toots

To get the most recent toots of a specific instance use get_timeline_public()

get_timeline_public(instance = "mastodon.social")

To get the most recent toots containing a specific hashtag use get_timeline_hashtag()

get_timeline_hashtag(hashtag = "rstats", instance = "mastodon.social")

The function get_timeline_home() allows you to get the most recent toots from your own timeline.

get_timeline_home()

Get accounts

rtoot exposes several account level endpoints. Most require the account id instead of the username as an input. There is, to our knowledge, no straightforward way of obtaining the account id. With the package you can get the id via search_accounts().

search_accounts("schochastics")

(Future versions will allow to use the username and user id interchangeably)

Using the id, you can get the followers and following users with get_account_followers() and get_account_following() and statuses with get_account_statuses().

id <- "109302436954721982"
get_account_followers(id)
get_account_following(id)
get_account_statuses(id)

Posting statuses

You can post toots with:

post_toot(status = "my first rtoot #rstats")

It can also include media and alt_text.

post_toot(status = "my first rtoot #rstats", media="path/to/media", 
          alt_text = "description of media")

You can mark the toot as sensitive by setting sensitive = TRUE and add a spoiler text with spoiler_text.

(Be aware that excessive automated posting is frowned upon (or even against the ToS) in many instances. Make sure to check the ToS of your instance and be mindful when using this function.)

Streaming

rtoot allows to stream statuses from three different streams. To get any public status on any instance use stream_timeline_public()

stream_timeline_public(timeout = 30,file_name = "public.json")

the timeout parameter is the time in seconds data should be streamed (set to Inf for indefinite streaming). If just the local timeline is needed, use local=TRUE and set an instance (or use your own provided by the token).

stream_timeline_hashtag() streams all statuses containing a specific hashtag

stream_timeline_hashtag("rstats", timeout = 30, file_name = "rstats_public.json")

The statuses are directly written to file as json. The function parse_stream() can be used to read in and convert a json to a data frame.

Pagination

All relevant functions in the package support pagination of results if the limit parameter is larger than the default page size (which is 40 in most cases). In this case, you may get more results than requested since the pages are always fetched as a whole. If you for example request 70 records, you will get 80 back, given that many records exist.