| P001 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
Version does not correspond to the version used in the latest release |
When the version declared in metadata does not match the software’s most recent release, users and automated systems may assume the metadata is outdated or unreliable. This inconsistency undermines reproducibility and makes software indexing services or repositories like Zenodo and Software Heritage less accurate. |
Ensure the version in your metadata matches the latest official release. Keeping these synchronized avoids confusion for users and improves reproducibility. |
1.0.0 vs v1.0.0 |
High |
| P002 |
LICENSE |
Copyright section taken as template without modification |
Leaving the copyright section unedited is neglect in legal documentation. It may retain placeholder values or incorrect author details, which creates legal ambiguity about ownership and distribution rights. |
Update the copyright section with accurate names, organizations, and the current year. Personalizing this section ensures clarity and legal accuracy. |
LICENSE in the repository has Copyright (C) |
High |
| P003 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
There are more than two authors but only one field for them ("author": "Thomas Boch and Matthieu Baumann") instead of a list |
Listing multiple authors in a single string breaks the metadata structure. It prevents proper parsing by services like ORCID or citation managers, leading to incomplete author records. |
You should separate multiple authors into a structured list. This allows tools and citation systems to correctly identify and credit each contributor. |
"author": "Name1 LastName1 and Name2 LastName2" |
Medium |
| P004 |
codemeta.json |
README property pointing to their homepage/wiki instead of README file |
When the README field references a homepage or wiki instead of the actual README file, documentation may not be indexed or found by metadata harvesters. This itself disrupts the integrity of self-documentation practices. |
Update the README property so it points directly to your actual README file instead of your homepage. This helps ensure users and automated tools can access your project documentation easily. |
"readme": "https://example.homepage.io" |
Low |
| P005 |
codemeta.json |
referencePublication refers to software archive instead of paper |
The referencePublication field is intended to point to the scholarly paper or publication that formally describes the software. When it incorrectly refers to a software archive (e.g., Zenodo, GitHub release, or other repository entry) instead of the paper, it can lead to misrepresentation of the source, loss of citation accuracy, and reduced discoverability of the associated research publication. |
Ensure that the referencePublication field points to the scholarly paper describing the software, not to a software archive or repository entry. |
"referencePublication": https://doi.org/XX.XXX/zenodo.XXXXXX |
High |
| P006 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
License pointing to a local file instead of stating the name. For example: “license”: LICENSE.MD |
Referencing a local file path (e.g., LICENSE.md) instead of using a recognized license name or identifier reduces machine-readability. Automated tools cannot infer licensing terms accurately. |
You need to replace local file paths with recognized SPDX license identifiers, such as MIT or GPL-3.0-only in URL form. This ensures your license can be correctly detected by automated tools. |
“license”: LICENSE.MD |
Medium |
| P007 |
CITATION.cff |
CITATION.cff does not have referencePublication even though it’s referenced in codemeta.json |
A missing reference publication field limits the connection between the software and its academic reference. It weakens citation tracking and discoverability in scholarly contexts. |
Add a referencePublication field with the related DOI or citation entry to your CITATION.cff. This will help link your work to its scholarly references. |
In codemeta.json, we see that "referencePublication": "https://arxiv.org/html/XXX.XXXXXX" and CITATION.cff does not have the field |
Low |
| P008 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
softwareRequirement points to an invalid page |
Broken or incorrect links can hinder transparency and traceability of software dependencies. Users and automated systems would not be able to retrieve required resources. |
Verify and update any dependency links to ensure they lead to valid and accessible pages. |
"softwareRequirement" = "https://example.docmentation.org" returns 404 |
High |
| P009 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
coderepository points to their homepage |
Mislabeling a project’s homepage as its code repository misleads automated harvesters and confuses users. |
You need to update the codeRepository field to point directly to your repository’s source code instead of a homepage. Accurate links improve traceability and user access. |
"codeRepository" = "https://example.documentation.org" instead of "https://github.com/user/repos" |
High |
| P010 |
LICENSE |
The file does not have any specific license except for copyright |
A LICENSE file missing the actual license text prevents users from knowing the terms of use, redistribution, or modification. |
You need to include the complete text of a recognized license such as MIT, Apache 2.0, or GPL. A full license clarifies rights and usage conditions for others. |
LICENSE file would have only YEAR XXXX COPYRIGHT XXXXX with no license specification (MIT, GNU license, Apache etc...) |
High |
| P011 |
codemeta.json |
IssueTracker violates the expected URL format |
Invalid issue tracker URLs hinder user engagement and automated link resolution. |
You need to correct the issue tracker URL so it follows a valid format, such as https://github.com/user/repo/issues. Proper links help users engage with your development process. |
"IssueTracker": "n/ https://example.issues.com" |
Medium |
| P012 |
codemeta.json |
downloadURL is outdated |
Outdated download URLs lead users to old releases or missing artifacts, breaking reproducibility. |
You need to update the downloadURL field to point to your latest release or current distribution source. Outdated links can mislead users or cause failed installations. |
"downloadURL": "https://git/example-repo/repo/-/archive/3.8.0/repo-3.8.0.tar.gz" when latest release is "4.0.0" |
High |
| P013 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
License does not have the specific version |
Specifying a license without including its version introduces ambiguity regarding the exact terms of use. For example, “GPL” could refer to any of its variants (GPL-2.0, GPL-3.0, etc.), which may have specific legal and use case differences. This can create uncertainty for users, contributors regarding compliance obligations. |
You should declare the specific version of the license using a recognized SPDX identifier. For example, use "GPL-3.0-only" or "GPL-2.0-or-later" instead of simply "GPL" |
Licenses like GNU would be mentioned but without a version, like "license": |
High |
| P014 |
codemeta.json |
uses bare DOIs in the identifier field instead of full https://doi.org/ URL |
A bare DOI (e.g., 10.5281/zenodo.12345) is not a resolvable link by itself. Automated metadata harvesters and citation tools rely on complete URLs to link records across repositories and digital archives. |
You should include the full DOI URL form in your metadata (e.g., https://doi.org/XX.XXXX/zenodo.XXXX) |
"identifier": "XX.XXXX/zenodo.XXXXXXX" instead of a URL like "https://doi.org/XX.XXXX/zenodo.XXXXXXX" |
Low |
| P015 |
codemeta.json |
contIntegration link returns 404 |
Continuous Integration (CI) links often point to automated testing or build systems (like GitHub Actions, Travis CI, or GitLab CI). Having an invalid link can make automated systems that check software quality flag the repository as inactive. |
You need to update the outdated URLs to point to the current CI platform, or remove the property if no active CI is in place. Periodically test all external links, especially those related to CI or build status. |
"contIntegration" = "https://example.contIntg.org" returns 404 |
Medium |
| P016 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
codeRepository does not point to the same repository |
Mismatched codeRepository links confuse users and metadata aggregators by referencing the wrong repository. Such inconsistencies damage credibility and hinder automation workflows |
Make sure that the codeRepository URL in your metadata exactly matches the repository hosting your source code. |
"codeRepository": "https://github.com/example2_repo/" while the original repository would be "https://github.com/example1_repo" |
Medium |
| P017 |
codemeta.json |
version does not match the package’s |
When the version recorded in codemeta.json differs from the version specified in package manifests (e.g., setup.py, package.json etc...), it introduces discrepancies that confuse users and systems tracking software evolution. it lead to incorrect citation, packaging, or deployment of outdated versions. |
You need to synchronize all version references across metadata and build configuration files. |
in codemeta.json, "version" would be 1.0.0 and in any metadata file (for example setup.py) the "version" would be 0.9.0 |
Medium |
| P018 |
codemeta.json |
Identifier uses raw SWHIDs without their resolvable URL |
Software Heritage Identifiers (SWHIDs) are persistent identifiers that reference archived software artifacts. However, using raw identifiers without their full resolvable URL (e.g., swh:1:dir:abcd...) prevents external systems from accessing the archive. This reduces transparency, hinders verification, and limits long-term reproducibility. |
Always use the full resolvable SWHID URL (e.g., https://archive.softwareheritage.org/swh:1:dir:abcd.../). This will ensures that both humans and machines can access the archived software snapshot directly |
"identifier": "swh:1:dir:7d7edd5890f7687663c121abe1c3818a16f1cdb2" |
|
| P019 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
Inconsistent author count across metadata sources |
Different metadata files within the same repository report a different number of authors. This leads to confusion about who should be credited and which file is the source of truth. |
Ensure that the author list is synchronized across all metadata files. Using a single source of truth or automated synchronization tools can prevent this discrepancy. |
codemeta.json has 3 authors and setup.py has 1 author. |
High |
| W001 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
SoftwareRequirements don’t have versions |
Dependencies without version constraints can lead to instability, as future versions of those libraries may introduce breaking changes. This hinders reproducibility and automation for package installers or research reproducibility platforms. |
Add version numbers to your dependencies. This provides stability for users and allows reproducibility across different environments. |
"softwareRequirements": ["NumPy","Pandas"] |
- |
| W002 |
codemeta.json |
dateModified is outdated |
dateModified is outdated with respect to the latest date of the Repositories. |
Make sure the dateModified field reflects the most recent modification date among all linked Repositories. Update it to match the latest Repository date. |
"dateModified": "2024-11-29" vs Repository last modified "2025-12-01" |
- |
| W003 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
Inconsistent use of licenses in metadata files |
When the repository includes multiple licensing terms but metadata lists only one, it creates confusion about reuse permissions. Automated systems or users may apply the wrong license. |
List all applicable licenses if your repository includes more than on match the prioritized license to the license field in your metadata files. This avoids confusion about terms of use and ensures full transparency. |
If you have two license, "license": ["https://spdx.org/licenses/GPL-3.0-or-later", "https://spdx.org/licenses/BSD-3-Clause"] |
- |
| W004 |
codemeta.json |
programmingLanguages do not have versions |
Programming languages keep updating and version differences can cause compatibility or performance issues. Missing language versions limit reproducibility. |
Include version numbers for each programming language used. Defining these helps ensure reproducibility and compatibility across systems. |
In codemeta.json for example, we would find: "programmingLanguages": ["Python"] instead "Python3" |
- |
| W005 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
softwareRequirements have more than one req, but it’s written as one string |
Writing multiple dependencies in one string instead of a structured list reduces readability and breaks compatibility with parsers. |
Rewrite your dependencies as a proper list, with each item separated and preferably with their versions. This makes them easier to parse for metadata systems. |
"softwareRequirements": ["Python, Pandas"] |
- |
| W006 |
codemeta.json |
Identifier is a name instead of a valid unique identifier, but an identifier exist (e.g., in a badge) |
Using a simple name rather than a globally unique identifier (like DOI or SWHID) makes the project less discoverable and interoperable across metadata registries. |
You should replace plain name in your identifier field with persistent identifiers, such as DOIs or SWHIDs, to improve discoverability and interoperability. |
"identifier": "name" when DOI badge like "https://doi.org/XX.XXXX/zenodo.XXXXXXX" is available in README for example |
- |
| W007 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
Identifier is empty |
Missing identifiers reduce the visibility and traceability of the software in digital repositories. |
You should add a unique identifier, like a DOI, repository URL, or SWHID, to ensure your software can be cited and referenced correctly, or in the case of not having one you should delete the empty identifier field. |
"identifier": "" |
- |
| W008 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
GivenName in author field, has more than one name |
Incorrect data typing (a list instead of a string) breaks schema compliance and complicates contributor parsing. |
Ensure givenName is a single string per person. This ensures that every author is properly credited and can be extracted automatically |
"author": [{"@type": "Person", "givenName": ["name1", "name2"], "familyName": ...] |
- |
| W009 |
codemeta.json |
developmentStatus is a URL instead of a string |
Some fields expect predefined strings, not URLs. Using incorrect data types can cause schema validation errors. |
You need to replace URLs in the developmentStatus field with descriptive text values, such as “active,” “beta,” or “stable.” This maintains schema compliance and clarity. |
"developmentStatus": "https://www.example.org/lifecycle/#superseded" instead of "developmenStatus": "Suspended" |
- |
| W010 |
Metadata files (codemeta.json, setup.py, pom.xml etc...) |
codeRepository uses Git remote-style shorthand in codeRepository instead of a full URL |
When the codeRepository field uses Git remote shorthand (e.g., git@github.com:user/repo.git) instead of a complete HTTPS URL, web-based tools and metadata harvesters cannot resolve it. |
You should replace the remote-style syntax with a full web-accessible URL (e.g., https://github.com/user/repo). |
"codeRepository": "github.com:cicwi/PyCorrectedEmissionCT.git" |
- |