The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete semantic technology framework. OSF is currently at version 3.x with six years of continuous development. OSF is made available under the Apache 2 license.

The basic architecture of OSF is quite simple. The architecture pivots around OSF's Web services, for which there are now nearly 30 providing a wealth of functionality. This intermediate OSF Web services layer may be accessed directly via API or command line or utilities like cURL, or may be controlled and interacted with using standard content management systems (CMSs). The RESTful OSF Web services provide the uniform means to access best-of-breed data management and indexing engines. This design both: 1) abstracts away the complexity of the individual engines, while 2) enabling combined capabilities orchestrated by OSF not available from the engines alone. Full CRUD (create, read, update, delete) under user permissions and security is provided to all digital objects in the stack.

Simple OSF Stack

The premise of the entire stack is based on the RDF (Resource Description Framework) data model. RDF provides the ready means for integrating existing structured data assets in any format, with semi-structured data like XML and HTML, and with unstructured documents or text. The OSF stack is supported by complete documentation, automated installation routines, comprehensive unit and end-to-end tests, and workflows and use case studies to ease adoption. SD and its partners provide experienced support and extension services.

The Role of Ontologies

The OSF framework is made operational via ontologies that capture the domain or knowledge space, matched with internal ontologies that guide OSF operations and data display. This design approach is known as ODapps, for ontology-driven applications. Ontologies are, in essence, graph structures. Graphs are among the most ubiquitous models of both natural and human-made systems. They can be used to model many types of relations and process dynamics in physical, biological and social systems. Any problem of practical interest may be represented by a graph. They are especially well suited to capture and manage knowledge domains.

Ontologies are the graph frameworks for organizing information on the semantic Web and within semantic enterprises. They provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness; that is, their ability to represent conceptual relationships. Ontologies are essentially a series of these connected statements. Each statement relates a "thing" to either another thing or to a value for an attribute. The object of one assertion can be the subject of another one. In this manner, these statements get connected together to form the ontology structure of a knowledge graph. Ontologies can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And ontologies may be developed and matured incrementally, which means their adoption may be cost-effective as benefits become evident.

Mediating the natural semantic differences that arise between people, departments and other actors in the information space is done by employing best practices for ontology and vocabulary construction. Some of these practices involve capturing the various synonyms, acronyms, slang, jargon, and dismissive terminology that might be applied to concepts and things. This breadth of semantic characterization means the underlying concepts and entities can be better identified. This best practice plus explicit design also leads to inherent multi-linguality in the system.

In addition to this knowledge domain purpose, ontologies are also used for guiding how the various applications work within an OSF installation. These supplementary administrative ontologies guide, for example, how the user interfaces or widgets in the system should behave. OSF is unique in the way in which it combines domain and administrative ontologies to specify the scope and behavior of the system. Indeed, all layers in the Open Semantic Framework can remain relatively fixed while tailoring the instance to new domains solely via the ontologies employed. Ontologies are what provides any given instantiation of OSF its unique focus and scope.

An Architecture Based on the Web

The overall philosophy in architecting the OSF stack is to provide a Web-based, scalable framework for integrating data and content from a variety of sources. OSF corresponds to what is known as a Web-oriented architecture. WOA has a number of features:

  • Data is generally exposed (and universally available) as linked data
  • SPARQL endpoints and APIs are generally RESTful in design
  • The overall architecture is modular, with inherent decentralized and distributed aspects
  • All display and visualization aspects are cross-browser ready and capable.

WOA builds on aspects of many of the largest properties on the Web, with proven scalability and extensibility. As used in OSF, these proven Web aspects are enhanced by adhering to open standards from the W3C (World Wide Web Consortium) in the areas of semantic technologies and vocabularies. This adherence to standards helps ensure that instances built with the Open Semantic Framework have a high degree of interoperability with other sites and capabilities on the Web.

OSF provides a standardized content storage and management environment that is Web-accessible, scalable and distributed. The content that can be hosted within OSF includes documents (unstructured data), metadata (semi-structured data), conventional database information (structured data) and multimedia metadata. While this content can (and does!) exist in multiple native formats in the wild, it is converted to the common RDF format that enables the development of standard ("canonical") tools and operations to act upon this content.

The OSF design and architecture is purposefully generic. The same set of tools and capabilities used in OSF can be applied to manage and understand information in any domain. What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists) used by OSF. Differences between domains may also determine which components are included or not for a given instantiation.

The Content Management Layer

Though any mature content management system (CMS) may act as the presentation front-end to the Open Semantic Framework, Drupal is the standard option packaged with OSF. Drupal has a rich ecosystem of developers and support, plus thousands of modules that extend its functionality and an architecture well-suited to the requirements of OSF. The OSF for Drupal option leverages existing, well-known Drupal modules and Drupal itself in ways familiar to the broader Drupal community. OSF's integration with Drupal occurs via the standard plug-in modules of Drupal and "Drupal connectors". OSF Drupal modules are conventional Drupal modules written specifically to act as a management interface to the OSF. Drupal connectors are specific to OSF; they are Drupal libraries written specifically for OSF that extend current, popular Drupal APIs.

Most recently, OSF has also been integrated with simpler, alternative user interfaces such as Bootstrap. This has accompanied the broader use of Clojure in the Web services layer.

The Middleware (Web Services) Layer

The Open Semantic Framework stack is controlled or interacted with via its Web services at the middleware layer. Some of this interaction may occur via dedicated APIs, some programmatically.

The clj-osf is a simple Clojure Domain Specific Language (DSL) used to query Open Semantic Framework (OSF) web service endpoints. Each of the OSF web service endpoint has its own clojure function. A series of function can be chained to generate a OSF query. That function is used to generate any query, to send it to be endpoint of a OSF Web Service instance and to get back a resultset. The resultset can then be manipulated by using the internal structEDN data structure.

The OSF Web Sevices PHP API is a library available to PHP developers to help them generate queries to any OSF Web service endpoint. Each endpoint has its own WebServiceQuery class in the API that is used to generate the query, send it to the appropriate endpoint, and get back a resultset. The resultset can then be manipulated by using the Resultset API. This same API can be used to transform the resultset into different formats. A similar API library is being extended for Clojure.

These APIs enable developers (or third-party apps such as Drupal) to call functions directly, which then issue the HTTP queries to the respective OSF Web Service endpoints. It is also via the middleware layer that security and external services may interact with the system. For security, it is possible to either use the native OSF service or invoke an external system.

Because of the central importance of the Web services layer, links to specific services and sandboxes for them are provided under the Web Services menu item.

The OSF Engines Layer

The functionality of the Web services layer is based on controlling and interacting with the underlying data engines in the OSF stack. Using the common RDF data model means that all Web services and actions against the data only need to be programmed via a single, "canonical" form. Simple converters convert external, native data formats to the RDF form at time of ingest; similar converters can translate the internal RDF form back into native forms for export (or use by external applications). This use of a "canonical" form leads to a simpler design at the core of the stack and a uniform basis to which tools or other work activities can be written. This leads to lower development and maintenance costs, and faster implementation.

The fundamental unit of record aggregation upon which the OSF engines act is the "dataset". A dataset refers to a named grouping of records, best designed as similar in record types and intended access rights (though technically a dataset is any named grouping of records). All data objects (what is called in various settings instances, entities, kinds, types or classes) and their relations (properties, fields, attributes) and their annotations (metadata) are given Web identifiers in the form of URIs. This means any and all data within the OSF has a unique identifier, accessible using the HTTP protocol.

The OSF engines layer governs the index and management of all OSF content. The OSF engines are all open source. Documents are indexed by the Solr engine for full-text search, while information about their structural characteristics and metadata are stored in an RDF database, called a "triple store," provided by the Virtuoso engine. The schema aspects of the information (the ontologies) are separately managed and manipulated with their own W3C standard application, the OWL API. At ingest time, the system automatically routes and indexes the content into its appropriate stores. Another engine, GATE, is available for semi-automatic assistance in tagging input information and other natural language processing (NLP) tasks.

The OSF engines layer also includes the PHP/Java Bridge, an XML-based network protocol to connect to a Java virtual machine. The bridge gives us the capability to run Java-based engines efficiently within the stack. (Clojure provides similar linkages to Java.) For efficiency, Web service requests are handled by Memcached. It is an open source, high-performance, distributed memory object caching system. The generic Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects), well suited to OSF's API calls.

Special OSF Capabilities

The Open Semantic Framework comes packaged with a very powerful semantic search capability that provides faceting, inference, tuning and configurability. Via profiles, the search function can be contextually varied depending on different needs and purposes in different locations. Search results can also be tuned via weights and filters to achieve different desired results rankings. Creating these profiles is aided by an interactive query builder.

The OSF Tagger is a powerful tool for tagging unstructured text or documents using the very same ontologies and vocabularies used throughout the system. This semi-automatic process enables text to be structurally characterized and integrated with other structured data. Various workflows and user screens support the tagging process.

Internal ontology management is provided via a dedicated tool, which can be invoked contextually as part of standard editing or review workflows. The ontology tool also can export and import ontologies managed by external tools, such as the open source Protégé ontology manager.

A variety of OSF Widgets are also packaged with the system. These widgets are Flex or JavaScript components that display or visualize information within the system. The component library includes maps, graphs, charts, annotators, graph browsers and visualizers, and the like. Depending on the logic described in the input schema and the nature of the data in results, the system will chose amongst the available widgets to provide the most useful visualization or display. The OSF widgets are provided as a library, though documentation is provided for how developers may extend the library with their own widgets. Check out the wiki descriptions of the OSF widgets and their APIs.

OSF Management Tools

A number of tools accompany the Open Semantic Framework to aid in managing and configuring the stack and the data and ontologies it uses. These tools are included as part of the standard OSF installs:

  • The OSF Tests Suites are a series of about 800 unit tests applied against various OSF Web Services. These tests can be applied automatically via script to check for inadvertent problems during development.
  • The OSF Datasets Management Tool (DMT) is a command-line tool used to manage datasets with a OSF Web Services network instance. Different operations can be performed related to datasets management. The Datasets Management Tool can handle any size of dataset. If the dataset file is too big, the framework will slice it in multiple slices and will send each slice to the OSF Web Services instance.
  • The OSF Permissions Management Tool (PMT) is a command-line tool used to manage access permissions on a OSF Web Services network instance. This tool is used to list, create and delete access permissions, groups and users.
  • The OSF Ontologies Management Tool (OMT) is a command-line tool used to manage ontologies of a OSF Web Services network instance. It can be used to list ontologies of a OSF Web Services instance, to create/import new ones, to delete existing ones, to generate underlying ontological structures, etc.
  • In addition to these management tools, there is an automatic OSF Installer script that is used to install and deploy a OSF stack on a Ubuntu server. It can also be used to install, upgrade and configure parts of the stack, or related external tools such as the Datasets Management Tool, the Ontologies Management Tool, the OSF Web Service-PHP-API, etc.

OSF Documentation and Support

The OSF stack is supported by complete documentation, which is freely available for download or local reuse. The OSF documentation wiki contains about 500 technical articles and a further 1,000+ images. In addition, Structured Dynamics and its partners offer a variety of support and development services.