On the new OpenMDW open-source AI license
A step in the right direction, but it's not a Panacea. Linux's new license starts simple.
Disclaimer, this is license vibes and not legal advice, but it should still be useful.
One of the long standing todo items for open-source AI is better licenses. There’s a new license from the Linux Foundation, OpenMDW, that you should check out. Below are my unstructured thoughts — TLDR is that it seems like a solid, simple starting point. A stripped down MIT/Apache 2.0 license for AI models.
Mostly, these new AI licenses should apply to models as we’ve had solid data licenses for a bit. The most useful data licenses in my opinion tend to be a combination of the CC-BY line from Creative Commons, which are designed to “give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law.” These are best for new, manually crafted data like prompts or new manually edited completions.
On the other side is ODC-By from OpenDataCommons that makes it so one can compile multiple licenses together in a curated dataset and pass responsibility to the user to follow each subcomponent as follows requisite licenses and terms of service. Their summary is:
You are free:
To share: To copy, distribute and use the database.
To create: To produce works from the database.
To adapt: To modify, transform and build upon the database.
As long as you:
Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the license. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database.
This is used a lot when one is confused about how to license model outputs, which don’t really work under copyright (when the legal cases are in purgatory).
So, data is in a reasonable place even if there’s much more work to be done there. Best practices have continued to evolve, but they seem okay.
Models, on the other hand, are a bit messy. Remember that we recently had the Open Source Initiative propose a model license. I covered it here (and is related to all my past writing on the topic of open-source AI):
The most common permissive licenses of model weights are software licenses like MIT and Apache 2.0. For example, Apache 2.0 for software tells the user to mark where files were modified from — how does that apply to model weights?
Where should the distinction go? I’m not an expert, but there are other ambiguities. For example, lots of open-source AI licenses are looked to for guidance on model outputs — either applying restrictions or stating no ownership. This puts us back in the territory of the above discussion on data, so it’ll be hard to have a license that works for both model weights, input data, and output data.
Thus, we got a new entrant from the Linux foundation — the OpenMDW License. There’s a basic license website, license details (over time), and version 1.
The issue with this license at first glance may be that the model data gets tied to the license:
As used in this agreement, "Model Materials" means the materials provided to you under this agreement, consisting of: (1) one or more machine learning models (including architecture and parameters); and (2) all related artifacts (including associated data, documentation and software) that are provided to you hereunder.
So, for Ai2’s purposes with OLMo, it would be unlikely it gets used. It makes no claims to outputs, which I think is the right way to do it. Reading the FAQ, they make it a bit clearer that’s only the user of the license also releases the data with the OpenMDW too.
Under OpenMDW‑1.0, Model Materials include:
Machine‑learning models (architecture and parameters); and
All related artifacts (including associated data, documentation and software) that are provided under OpenMDW-1.0.
This license is quite short, I’ve copied the full text below (re-formatted by AI if slightly different):
OpenMDW License Agreement, version 1.0 (OpenMDW-1.0)
By exercising rights granted to you under this agreement, you accept and agree to its terms.
As used in this agreement, “Model Materials” means the materials provided to you under this agreement, consisting of: (1) one or more machine learning models (including architecture and parameters); and (2) all related artifacts (including associated data, documentation, and software) that are provided to you hereunder.
Subject to your compliance with this agreement, permission is hereby granted, free of charge, to deal in the Model Materials without restriction, including under all copyright, patent, database, and trade secret rights included or embodied therein.
If you distribute any portion of the Model Materials, you shall retain in your distribution (1) a copy of this agreement, and (2) all copyright notices and other notices of origin included in the Model Materials that are applicable to your distribution.
If you file, maintain, or voluntarily participate in a lawsuit against any person or entity asserting that the Model Materials directly or indirectly infringe any patent, then all rights and grants made to you hereunder are terminated, unless that lawsuit was in response to a corresponding lawsuit first brought against you.
This agreement does not impose any restrictions or obligations with respect to any use, modification, or sharing of any outputs generated by using the Model Materials.
THE MODEL MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, NONINFRINGEMENT, ACCURACY, OR THE ABSENCE OF LATENT OR OTHER DEFECTS OR ERRORS, WHETHER OR NOT DISCOVERABLE, ALL TO THE GREATEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW.
YOU ARE SOLELY RESPONSIBLE FOR (1) CLEARING RIGHTS OF OTHER PERSONS THAT MAY APPLY TO THE MODEL MATERIALS OR ANY USE THEREOF, INCLUDING WITHOUT LIMITATION ANY PERSON'S COPYRIGHTS OR OTHER RIGHTS INCLUDED OR EMBODIED IN THE MODEL MATERIALS; (2) OBTAINING ANY NECESSARY CONSENTS, PERMISSIONS OR OTHER RIGHTS. REQUIRED FOR ANY USE OF THE MODEL MATERIALS; OR (3) PERFORMING ANY DUE DILIGENCE OR UNDERTAKING ANY OTHER INVESTIGATIONS INTO THE MODEL MATERIALS OR ANYTHING INCORPORATED OR EMBODIED THEREIN.
IN NO EVENT SHALL THE PROVIDERS OF THE MODEL MATERIALS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL MATERIALS, THE USE THEREOF OR OTHER DEALINGS THEREIN.