The Decoder September 30, 2024 NEUTRAL

Court win lets LAION scrape copyrighted images for AI data

The Hamburg Regional Court sided with LAION in case number 310 O 227/23, allowing the non-profit’s collection of a copyrighted image for its LAION-5B dataset under Section 60d of German copyright law. The ruling does not settle whether for-profit companies can do the same, or whether collected data may be used to train AI systems.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 1 ►

This is mainly a narrow legal ruling about research data scraping, with only mild implications for AI capability and copyright norms.

Court win lets LAION scrape copyrighted images for AI data

A Hamburg court has given LAION an important legal win in a dispute over copyrighted images and AI training data. The decision supports the non-profit organization’s ability to collect and process a copyrighted image for a freely available research dataset, but it leaves some of the largest questions in AI copyright unresolved.

The case matters because it focuses on a basic step behind modern AI development: gathering data from online sources, matching images with text, and making those pairs available for analysis. The ruling clarifies one situation involving a non-profit research dataset, while avoiding a broader answer for commercial AI companies.

What the Hamburg court decided

The dispute was between a photographer and LAION. The Hamburg Regional Court sided with LAION in case number 310 O 227/23.

LAION had taken an image from a photo agency's website, connected it with a description, and added the URL and description to its freely available "LAION-5B" dataset. That dataset contains 5.85 billion image-text pairs and is used for AI training research.

The photographer sued LAION for copyright infringement. The court agreed that downloading and processing the image counted as a copyright-relevant reproduction. That point is important: the court did not say that no copying occurred.

Instead, the court found that LAION’s action was justified under Section 60d of German copyright law. The source describes Section 60d as permitting text and data mining for non-commercial scientific research.

Why LAION’s non-commercial role mattered

The court looked at LAION’s specific conduct rather than treating the organization’s structure as the central issue. According to the ruling described in the source, LAION released the dataset freely for research, so it was not pursuing commercial goals in that activity.

The fact that companies also use the dataset did not change the court’s conclusion. In other words, the court focused on what LAION did when it collected and released the dataset, not on every possible later user of the data.

That distinction is central to the decision. It gives research groups a clearer path for collecting AI training data when their work fits the non-commercial scientific research framework described in Section 60d. But it does not automatically give the same protection to companies collecting data for their own commercial AI systems.

Heidrich Rechtsanwälte, the law firm representing LAION, described the ruling as follows: "The ruling of the Hamburg Regional Court creates an important basis for the legally compliant use of publicly accessible data in the context of scientific research. It confirms that the association can continue to make a significant contribution to the promotion of open-source initiatives in the future, which promotes AI development in Germany in particular,"

The Section 44b question remains open

The court did not have to decide whether LAION could also rely on Section 44b. The source describes Section 44b as a broader text and data mining exception that allows copying legally accessible works for automated analysis of digital works to extract information about patterns, trends, and correlations.

That exception comes with limits. Copies must be deleted when they are no longer needed for mining. Rights holders can also reserve these uses, but for online works that reservation must be made in machine-readable form.

In this case, the court doubted that the photo agency's website had a machine-readable notice restricting use. Still, because the ruling was resolved under Section 60d, the larger Section 44b issue was not fully decided.

That leaves a practical uncertainty for rights holders and AI developers. A rights holder may want to restrict text and data mining, but the source indicates that the restriction must be machine-readable for online works. At the same time, AI data collectors still lack a definitive answer on how far Section 44b reaches in comparable disputes.

What this means for AI training data

The ruling shows that research groups can collect AI training data under the circumstances addressed by the Hamburg court. It is especially relevant to non-commercial scientific research and freely available datasets like "LAION-5B".

But the decision is narrower than the broader AI copyright debate. The source makes clear that the ruling is about collecting data, not about actually using that data to train AI systems.

That distinction matters because companies like OpenAI have done both, according to the source: they have taken copyrighted online data without permission and used it to train their systems. The Hamburg ruling does not answer whether that broader conduct is allowed.

The ruling also does not resolve whether for-profit companies can rely on the same reasoning. LAION’s non-commercial research activity was central to the court’s analysis, so commercial AI developers remain in a different and less certain position.

The next stage may come on appeal

The source says the photographer is likely to appeal to a higher court because of the importance of the case. If that happens, the legal questions around AI data scraping, copyright, and text and data mining could receive further review.

Several lawsuits are still pending on the same general issue. The source identifies the case between the New York Times and OpenAI as probably the most high-profile example.

For now, the Hamburg ruling gives LAION and similar research efforts a meaningful legal basis in one specific context. It confirms that collecting publicly accessible data for non-commercial scientific research can be treated differently from commercial scraping and model training.

The result is not a final map for AI copyright. It is a narrower decision with immediate importance for research datasets, while the harder questions about companies, training use, and rights-holder restrictions remain unsettled.