RDFTableConversion.MDS_DF User Guide
MatDatSciDf
The MatDatSciDf class is a semantic wrapper for Pandas DataFrames. It ensures that data is structurally valid, ontologically mapped, and attributed to a verified researcher (ORCID) before transformation into Linked Data (RDF).
Core Architecture
An instance of MatDatSciDf manages three synchronized components:
1. Measurement Data: A cleaned Pandas DataFrame.
2. Metadata Graph: An RDFLib Graph and JSON-LD template synchronized via the metadata_obj.
3. Semantic Relations: A mapping of inter-column links via the data_relations manager.
Initialization & Metadata Ingestion
You can initialize the firewall with a standard DataFrame. If your CSV includes the optional 3-row header (Type, Unit, Study Stage), the tracker can ingest them automatically.
import pandas as pd
from FAIRLinked import MatDatSciDf
df = pd.read_csv("experimental_data.csv")
# Initialize with researcher identity
mds_df = MatDatSciDf(
df=df,
orcid="0000-0001-2345-6789",
df_name="PMMA_Indentation_Study",
metadata_rows=True # Isolates the first 3 rows as semantic headers
)
mds_df.view_metadata()
Validation and Relations
Before export, use the firewall to audit alignment and define internal links.
# 1. Audit alignment between data and definitions
mds_df.validate_metadata()
# 2. Link columns (e.g., connect Hardness to a specific Sample)
relations = {
"is about": [("Hardness (GPa)", "Sample_ID")],
"mds:measuredBy": [("Hardness (GPa)", "Vickers_Indenter")]
}
mds_df.add_relations(relations)
Serialization (Export/Import)
# Bulk Export: Aggregate all rows into one master JSON-LD
mds_df.serialize_bulk(output_path="outputs/dataset.jsonld", license="MIT")
# Reconstruct: Restore a MatDatSciDf object from a directory of RDF files
reconstructed = MatDatSciDf.from_rdf_dir(input_dir="records/", orcid="0000-0001-2345-6789")
Method |
Purpose |
|---|---|
|
Maps columns to ontology terms (fuzzy-match or header-parse). |
|
Audits alignment between DataFrame and JSON-LD template. |
|
Connects columns together via semantic predicates. |
|
Converts the entire dataset into a master JSON-LD file. |
|
Saves “Semantic CSVs” or Parquet/Arrow files. |
Analysis Provenance (Tracker & Group)
The Analysis Tracking system provides a transparent “paper trail” by capturing function arguments, return values, and OS-level file system events.
AnalysisTracker: Atomic Auditing
The AnalysisTracker monitors a specific analysis event, generating a unique UUID and identifying the agent via ORCID.
from FAIRLinked import AnalysisTracker
tracker = AnalysisTracker(proj_name="Hardness_Fit", home_path="./results")
@tracker.track
def calculate_modulus(load, depth):
return (load / depth) * 0.75
# The function now logs all I/O and active file handles automatically
calculate_modulus(10.5, 0.02)
AnalysisGroup: Batch Orchestration
For parameter sweeps or iterative processing, AnalysisGroup aggregates multiple runs into a unified dataset.
from FAIRLinked import AnalysisGroup
group = AnalysisGroup(proj_name="Temperature_Sweep", home_path="./batch_data")
# Run multiple tracked iterations
for t in [300, 400, 500]:
group.run_and_track(my_simulation_func, temp=t)
AnalysisGroup also allows using the same AnalysisTracker instance to track a workflow.
from FAIRLinked import AnalysisGroup
from FAIRLinked import AnalysisTracker
group = AnalysisGroup(proj_name="Temperature_Sweep", home_path="./batch_data")
# Run multiple tracked iterations
for t in [300, 400, 500]:
tracker = AnalysisTracker(proj_name=f'Temperature_Sweep_{t}', home_path="./batch_data")
group.run_and_track(my_simulation_func, temp=t, tracker=tracker)
group.run_and_track(my_simulation_func_2, temp=t, tracker=tracker)
Batch Tracking with Decorators
from FAIRLinked import AnalysisGroup
# 1. Initialize the Group
group = AnalysisGroup(proj_name="Temperature_Sweep", home_path="./batch_data")
# 2. Use the @group.track decorator
# Each call to this function will now trigger a new AnalysisTracker internally.
@group.track
def my_simulation_func(temp):
"""
Performs a simulation at a specific temperature.
Inputs and outputs are automatically audited as separate runs.
"""
result = temp * 0.0012
return {"lattice_parameter": result}
# 3. Run multiple tracked iterations
# Each iteration receives a unique analysis_id and standalone JSON-LD graph.
for t in [300, 400, 500]:
my_simulation_func(temp=t)
# 4. Aggregate Results
# Flatten all independent runs into a single master DataFrame.
master_df = group.create_group_arg_df()
—
Semantic Integration
A key feature of AnalysisGroup is its ability to transition results directly back into the Semantic Firewall.
# 1. Flatten all run data into one master table
master_df = group.create_group_arg_df()
# 2. Bridge to Semantic Firewall: Automatically generates a MatDatSciDf
mds_obj = group.create_MatDatSciDf()
# 3. Export a master provenance graph linking all runs
group.save_jsonld()
Method |
Purpose |
|---|---|
|
Decorator for automatic function I/O auditing. |
|
Executes code while capturing arguments and file handles. |
|
Concatenates batch data into a single master DataFrame. |
|
Converts batch results into a semantic-aware MDS object. |
|
Serializes the complete provenance graph. |
License and Compliance
Use the built-in SPDX utility to find valid licenses for your data serialization.
MatDatSciDf.search_license("Creative Commons")