User Guide
==========
Subpackages
-----------
The package is organized into three subpackages, each addressing different levels of semantic web expertise
and data modeling requirements:
1. `InterfaceMDS`: A library of functions that enables direct interaction with MDS-Onto, allowing users
to query, extend, and integrate ontology-driven metadata into their datasets and analytical pipelines.
2. `QBWorkflow`: A comprehensive FAIRification workflow designed for users familiar with the RDF Data
Cube vocabulary. This workflow supports the creation of richly structured, multidimensional datasets
that adhere to linked data best practices and can be easily queried, combined, and analyzed.
3. `RDFTableConversion`: A streamlined FAIRification workflow for users who prefer a lighter approach
that does not require RDF Data Cube. Instead, it leverages a JSON-LD template populated with
standard JSON objects derived from table columns. This approach enables users to transform tabular
datasets into linked data while maintaining control over metadata content and structure.
By offering both advanced and simplified pathways for converting data into semantically rich, machine-readable
formats, FAIRLinked lowers the barrier to adopting FAIR principles in the materials science community.
Its modular design allows researchers to choose the workflow that best matches their technical expertise,
data complexity, and intended use cases, thereby promoting greater data discoverability, interoperability, and
reuse.
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/figs/fig1-fairlinked.png
# InterfaceMDS Subpackage
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/figs/InterfaceMDSGitHub.png
.. code-block:: python
import FAIRLinked.InterfaceMDS
Functions in Interface MDS allow users to interact with MDS-Onto and search for terms relevant to their domains.
This includes loading MDS-Onto into an RDFLib Graph, view domains and subdomains, term search, and add new ontology
terms to a local copy.
## Load latest version of MDS-Onto
.. code-block:: python
import FAIRLinked.InterfaceMDS.load_mds_ontology
from FAIRLinked.InterfaceMDS.load_mds_ontology import load_mds_ontology_graph
mds_graph = load_mds_ontology_graph()
## View domains/subdomains in MDS-Onto
Terms in MDS-Onto are categorized under domains and subdomains, groupings related to topic areas currently being researched at SDLE and collaborators.
More information about domains and subdomains can be found at `here `_
.. code-block:: python
import FAIRLinked.InterfaceMDS.domain_subdomain_viewer
from FAIRLinked.InterfaceMDS.domain_subdomain_viewer import domain_subdomain_viewer
domain_subdomain_viewer()
## View domains/subdomains tree in MDS-Onto
.. code-block:: python
import FAIRLinked.InterfaceMDS.domain_subdomain_viewer
from FAIRLinked.InterfaceMDS.domain_subdomain_viewer import domain_subdomain_directory
domain_subdomain_directory()
Generate an actual file directory with sub-ontologies tagged by domain/subdomain:
.. code-block:: python
import FAIRLinked.InterfaceMDS.load_mds_ontology
from FAIRLinked.InterfaceMDS.load_mds_ontology import load_mds_ontology_graph
from FAIRLinked.InterfaceMDS.domain_subdomain_viewer import domain_subdomain_directory
mds_graph = load_mds_ontology_graph()
domain_subdomain_directory(onto_graph=mds_graph, output_dir="path/to/output")
## Search for ontology terms
.. code-block:: python
from FAIRLinked.InterfaceMDS.rdf_subject_extractor import extract_subject_details, fuzzy_filter_subjects_strict
from FAIRLinked.InterfaceMDS.load_mds_ontology import load_mds_ontology_graph
mds_graph = load_mds_ontology_graph()
onto_dataframe = extract_subject_details(mds_graph)
search_results = fuzzy_filter_subjects_strict(df=onto_dataframe, keywords=["Detector"])
print(search_results)
## Find Domain, Subdomain, and Study Stages
.. code-block:: python
from FAIRLinked.InterfaceMDS.term_search_general import term_search_general
term_search_general(query_term="Chem-Rxn", search_types=["SubDomain"])
Save results to Turtle:
.. code-block:: python
term_search_general(query_term="Chem-Rxn", search_types=["SubDomain"],
ttl_extr=True, ttl_path="path/to/output.ttl")
## Add a new term to Ontology
.. code-block:: python
from FAIRLinked.InterfaceMDS.add_ontology_term import add_term_to_ontology
add_term_to_ontology("path/to/mds-onto/file.ttl")
# RDF Table Conversion Subpackage
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/figs/fig2-fairlinked.png
.. code-block:: python
import FAIRLinked.RDFTableConversion
Functions in this subpackage allow you to:
* generate a JSON-LD metadata template from a CSV with MDS-compliant terms,
* generate JSON-LDs filled with data and MDS semantic relationships,
* convert a directory of JSON-LDs back into tabular format.
## Generate a JSON-LD template from CSV
.. code-block:: python
from rdflib import Graph
from FAIRLinked.RDFTableConversion.csv_to_jsonld_mapper import jsonld_template_generator
mds_graph = Graph()
mds_graph.parse("path/to/ontology/file")
jsonld_template_generator(csv_path="path/to/data.csv",
ontology_graph=mds_graph,
output_path="path/to/output/template.jsonld",
matched_log_path="path/to/output/matched.log",
unmatched_log_path="path/to/output/unmatched.log",
skip_prompts=False)
## Create JSON-LDs from CSVs
.. code-block:: python
import json
from FAIRLinked.RDFTableConversion.csv_to_jsonld_template_filler import extract_data_from_csv
with open("path/to/metadata/template.jsonld", "r") as f:
metadata_template = json.load(f)
extract_data_from_csv(metadata_template=metadata_template,
csv_file="path/to/data.csv",
row_key_cols=["sample_id"],
id_cols=["sample_id", "measurement_id"],
orcid="0000-0000-0000-0000",
output_folder="path/to/output/json-lds")
.. note::
Please make sure to follow the proper formatting guidelines for the input CSV file.
* Each column name should be the "common" or alternative name for this object
* The following three rows should be reserved for the **type**, **units**, and **study stage** in that order
* If values for these are not available, the space should be left blank
* Data for each sample can then begin on the 5th row
Please see the following images for reference:
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/resources/images/fulltable.png
:alt: Full Table
:align: center
Minimum Viable Data:
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/resources/images/mintable.png
:alt: Sparse Table
:align: center
During the template generating process, the user may be prompted for data for different columns. When no units are detected, the user will be prompted for the type of unit, and then given a list of valid units to choose from.
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/resources/images/kind.png
:alt: Kind
:align: center
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/resources/images/unit.png
:alt: Unit
:align: center
When no study stage is detected, the user will similarly be given a list of study stages to choose from.
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/resources/images/studystage.png
:alt: Study Stage
:align: center
## Create JSON-LDs with relationships
In the example below, "relationship_label" should be the rdfs:label value, the full IRI, or the CURIE associated with the property.
.. code-block:: python
import json
from FAIRLinked.InterfaceMDS.load_mds_ontology import load_mds_ontology_graph
from FAIRLinked.RDFTableConversion.csv_to_jsonld_template_filler import extract_data_from_csv
mds_graph = load_mds_ontology_graph()
with open("path/to/metadata/template.jsonld", "r") as f:
metadata_template = json.load(f)
prop_col_pair_dict = {
"relationship_label": [("column_1", "column_2")]
}
extract_data_from_csv(metadata_template=metadata_template,
csv_file="path/to/data.csv",
row_key_cols=["column_1", "column_3"],
id_cols=["column_1", "column_2"],
orcid="0000-0000-0000-0000",
output_folder="path/to/output/json-lds",
prop_column_pair_dict=prop_col_pair_dict,
ontology_graph=mds_graph)
## Convert JSON-LD directory back to CSV
.. code-block:: python
from FAIRLinked.RDFTableConversion.jsonld_batch_converter import jsonld_directory_to_csv
jsonld_directory_to_csv(input_dir="path/to/json-lds",
output_basename="dataset",
output_dir="path/to/output")
# RDF DataCube Workflow
.. code-block:: python
from FAIRLinked.QBWorkflow.rdf_data_cube_workflow import rdf_data_cube_workflow_start
rdf_data_cube_workflow_start()
The RDF DataCube workflow turns tabular data into a format compliant with the `RDF Data Cube vocabulary `_.
.. image:: https://raw.githubusercontent.com/cwru-sdle/FAIRLinked/main/FAIRLinkedv0.2.png