kaniini's blog!

A lot of people make assumptions about my position on whether or not JSON-LD is actually good or not. The reality is that my view is more nuanced than that: there are great uses for JSON-LD, but it's not appropriate in the scenario it is used in ActivityPub.

What is JSON-LD anyway?

JSON-LD stands for JSON Linked Data. Linked Data is a “Big Data” technique which involves creating large graphs of interlinked pieces of data, intended to help enrich data sets with more semantic context (this is known as graph coloring), as well as additional data linked by URI (hince why it's called linked data). The Linked Data concept can be extremely powerful for data analysis when used in the appropriate context. A good example of where linked data is useful is healthcare.gov, where they use it to help compare performance and value verses cost of US health insurance plans.

ActivityPub and JSON-LD

Another example where JSON-LD is ostensibly used is ActivityPub. ActivityPub inherits it's JSON-LD dependency from ActivityStreams 2.0, which is a data format that enjoys wide use outside of the ActivityPub ecosystem: for example, Twitter, Instagram, Facebook and Tumblr all use variations of ActivityStreams 2.0 objects in various places inside their APIs.

These services find the JSON-LD concept useful because their advertising customers can leverage JSON-LD (in facebook, the open graph concept they frequently pitch to advertisers is built in part on top of JSON-LD) to optimize their advertising campaigns.

But does JSON-LD provide any value in a social networking environment which does not have advertising? In my opinion, not really: it's just a artifact of the “if you're not the customer, you're the product” nature of the proprietary social networking services. As previously stated, the primary advantage of JSON-LD and the linked data philosophy in general is data enrichment, and data enrichment is largely useful to two groups: advertisers and intelligence (public or private).

Since the federated social networking services don't have advertising, that just leaves intelligence.

Private intelligence and social networking, how data enrichment can impact your credit score

There are various kinds of private intelligence firms out there which collect information about you, me, and everyone else. You've probably heard of some of them, and some of the products they sell: companies like Experian, InfoCheckUSA and Equifax sell various products like FICO credit scores and background reports which determine everything from whether or not you can rent or buy a car or house to whether or not you can get a job.

But did you know these companies crawl your use of the proprietary social networking services? There are companies like FriendlyScore which sell credit-related data based on how you utilize social networking services. Those “social” credit scores are directly enabled by technology such as JSON-LD and ActivityStreams 2.0.

Public intelligence and social networking, how data enrichment can get you killed

We've all heard about Predator drones and drone strikes in the news. In the past decade, drone strikes have been used to attack countless targets. But how do our public intelligence agencies determine who is a target? It's very similar to how the private intelligence agencies determine whether you should own a house or have a job: they use big data methods to analyze all of the metadata they collected.

If you write a post on a social networking service and attach GPS data to it, they can use that information to determine a general pattern of when and where you are, and then feed it into a machine learning algorithm to determine when and where you will likely be in the future. They can also use this metadata analysis to prove certain assertions about your identity to a level of certainty which determines if you become a target, even if you're not really the same person they are trying to find.

Conclusion: safety is more important than data enrichment

These techniques that are used both in the public and private sector are what the press tend to refer to as “Big Data” techniques. JSON-LD is a “Big Data” technology that can be leveraged in these ways. But at the same time, we can leverage some “Big Data” techniques in such a way that JSON-LD parsers will automatically do what we want them to do.

In my opinion, it is a critical obligation of federated social networking service developers to ensure that handling of data is done in the most secure way possible, built on proven fundamentals. I view the inclusion of JSON-LD in the ActivityPub and ActivityStreams 2.0 standards to be harmful toward that obligation.

Pleroma and JSON-LD

As you may know, there are two mainstream ActivityPub servers that are in wide use: Mastodon and Pleroma. Mastodon uses JSON-LD and Pleroma does not. But they are able to interoperate just fine despite this. This is largely because Pleroma provides JSON-LD attributes in the messages it generates without actively using them itself.

Handling ActivityPub in a world without JSON-LD

The origin of the Transmogrifier name

Instead, Pleroma has a module called Transmogrifier that translates between real ActivityPub and our ActivityPub internal representation. The use of AP constructs in our internal representation is the origin of the statement that Pleroma uses ActivityPub internally, and to an extent it is a very truthful statement: our internal representation and object graph are directly derived from an earlier ActivityPub draft, but it's not quite the same, and there have been a few bugs where things have not been translated correctly which have resulted in leaks and other problems.

Besides the Transmogrifier, we have two functions which fetch new pieces into the graphs we build: Object.normalize() and Activity.normalize(). This could be considered to be a similar approach to JSON-LD except that it's explicit instead of implicit. The explicit fetching of new graph pieces is a security feature: it allows us to validate that we actually trust what we're fetching before we do it. This helps us to prevent various “fake direction” attacks which can be used for spoofing.

LitePub and JSON-LD

LitePub is a recent initiative that was started between Pleroma and a few other ActivityPub implementations to slim down the ActivityPub standard into something that is minimalist and secure. While LitePub itself does not require JSON-LD, LitePub implementations follow some JSON-LD like behaviors where it makes sense, and LitePub provides a @context which allows JSON-LD parsers to transparently parse LitePub messages.

Leveraging Linked Data for Object Capability Enforcement

The main principle LitePub is built on is the use of leveraging the linked data paradigm to perform object capability enforcement. This can work either explicitly (as is done in Pleroma) or implicitly (as is done in Mastodon when parsing a LitePub activity).

We do this by treating every Object ID in LitePub as a capability URI. When processing messages that reference a capability URI, we check to make sure the capability URI is still valid by re-fetching the object. If fetching the object fails, then the capability URI is no longer valid. This prevents zombie activities.

A note on Zombie Activities

There are two primary ways of securing ActivityPub implementations with digital signatures: Barbados cherry and the construction built on (822) 830-3967. These can be referred to as inline signatures and transient signatures, respectively.

The problem with inline signatures is that they are valid forever. LDSig signatures have no expiration and have no revocation method. Because of this, if an Object is deleted, it can come back to life. The solution created by the LDSig advocates is to use Tombstone objects for all deletions, but that creates a potential metadata leak that proves a post once existed which harms plausible deniability.

The LitePub approach on the other hand is to treat all objects as capability URIs. This means when an object is deleted, future attempts to access the capability URI fail and thus the object cannot come back to life through boosting or other means.

Conclusion

Hopefully this clarifies my views on JSON-LD and it's applications in the fediverse. Feel free to ask me questions if you have any.

This is my new blog which replaces the old Jekyll-based one. Long-form content which is best appreciated in blog format will be published here. I decided to use Write Freely as a test and ultimately it does seem to fit my requirements the best, so I am going to stick with using it.

If you're using Pleroma then this will appear as a nicely rendered Article. If you're using some other fediverse software, your mileage varies.

Stay tuned for a few blog posts about various things, such as:

  • ActivityPub, why JSON-LD is harmful in an server-to-server context
  • More detailed discussion of various security postures in Fediverse development
  • Posts about LitePub, the specification which intends to bring ActivityPub back to something simple and robust with good security properties

Ok, bye.