Knowledge Graph Embeddings as a Bridge between Symbolic and Subsymbolic AI
The Resurgence of Structure
The pendulum in AI is swinging back from purely statistical approaches towards integration of structured knowledge. The rise of large neural models that ingest vast datasets have shown impressive benchmarks. Yet, their lack of transparency, tendency for bias and failure to capture common sense or causality reveals crippling limitations.
At the other end, knowledge graphs and ontologies precisely organize factual information and relationships as symbolic representations. But their symbolic nature makes applying statistical learning difficult and they fail to handle uncertainty or abstraction.
The Future is Hybrid
Integrating structured knowledge graphs with distributed neural network representations offers a promising path to augmented intelligence. We get the flexible statistical power of neural networks that predict, classify and generate based on patterns. Combined with the formalized curated knowledge encoding facts, logic and semantics via knowledge graphs.
Knowledge Graph Embeddings
Knowledge graph embeddings artfully straddle this intersection of structure and statistics. They mathematically translate the symbolic representation of entities and relations within a knowledge graph into a vector space. Each entity or relation is encoded as a dense vector such that geometric relationships between vectors mirror the real-world semantics between the corresponding symbols.
This numerical representation allows dynamically querying and manipulating the underlying semantics using mathematical operations over vectors. The vector space essentially acts as a differentiable mini-model of conceptual knowledge. At the same time, the discrete symbolic identifiers for each vector ground them firmly to real-world facts.
Supercharging Language Models
Knowledge graph embeddings open incredibly exciting possibilities for enhancing language models. They can provide an external memory bank of structured, factual information for models to consult. This equips them with critical missing context that reduces harmful hallucinations and biases.
More advanced approaches actively fuse dynamically retrieved knowledge graph vectors within the language model’s own latent representations. This allows neural architectures to essentially learn logical reasoning, lifting them beyond pattern recognition into deeper comprehensions.
The synergistic combination of symbolic knowledge graphs with connectionist models via embeddings looks set to push AI to new frontiers. Blending fluid neural creativity guided by structured imagination may get us closer than ever to more trustworthy intelligence.
Part 1 — Augmenting Neural Networks
- Challenges with massive pre-trained models
- Combining symbolic and subsymbolic representations
- Knowledge graph embeddings bridge the gap
- Encoding facts and relationships into continuous vectors
The Limits of Scale
The use of massive neural networks trained on vast datasets has driven recent AI advances. Models such as GPT-3, Mistral, Gemini Transformers with over 100 billion parameters have shown impressive language fluency. Commercial applications leverage these pre-trained foundation models by simply fine-tuning on target domain data.
However, fundamental issues around bias, toxicity, and tendency to hallucinate reveal gaps in their comprehension. Their lack of grounding, opacity around training data provenance and propensity to inherit prejudices makes trust difficult. Their pattern-matching nature also struggles to perform logical reasoning, simulation and counterfactual analysis.
Connecting Symbols and Signals
Knowledge graphs provide a way to ground these statistical models in curated facts and relationships around real-world entities. Whether it is common sense reasoning, science, geography or medicine — specialized knowledge graphs model key facts, constraints and semantic connections in a transparent structured format.
If this factual symbolic representation also gets distilled into a mathematical vector space akin to neural network embeddings — it opens possibilities. We retain precision of knowledge, while gaining statistical manipulability. The knowledge graph embeddings inject external understanding that massive models lack — improving coherence, accuracy and trust.
Translating Facts into Continuous Semantics
Knowledge graph embedding techniques algorithmically translate discrete symbolic entities and relations into high-dimensional vector spaces. Each relation becomes a specific geometric transform, entities occupy positions whose placement encodes their attributes and connections. Distance represents similarity.
Complex querying of the original knowledge graph now gets reduced to ratios of inner products between vector pairs. Logical and hierarchical constraints transform into simple geometric rules. Adding and subtracting vectors can uncover new knowledge. This mathematization of symbolic facts enables seamless fusion with neural systems.
Fuelling Neural Creativity
Instead of treating language models as opaque black boxes — knowledge graph embeddings provide an interface to efficiently query relevant factual context to guide generation. Models dynamically retrieve grounded vectors most compatible to current generation state, using them as continuing stimuli. Think — limitless external memory bank to tap structured imagination!
Many technical innovations around neural-symbolic integration remain — but grounding neural creativity with structured knowledge opens doors to more reliable, versatile and transparent AI systems. Knowledge graph embeddings bridge the strengths of both paradigms.
Part 2 — Supercharging Search
- Limitations of keywords for meaning
- Encoding semantics into vector geometry
- Blazing fast similarity search
- KG embeddings power semantic retrievers
The Limits of Keywords
Most information retrieval relies on keywords and variants of TF-IDF matching. But this has inherent limitations — vocabulary mismatch, lack of typo tolerance, no notion of conceptual relatedness. Retrievals remain brittle without considering linguistic and real-world context.
Encoding Meaning into Math Knowledge graph embeddings overcome this by encoding information elements like words, sentences, documents, queries as high dimensional vectors. Their positions relative to each other in this coordinate space capture semantics — similarity, hierarchies, analogies.
Unlike opaque neural encodings, each dimension maps to specific factual knowledge the embeddings have distilled from an underlying graph. This allows mathemetization of meaning while retaining interpretability. Sophisticated KG embedding techniques like node2vec, RDF2Vec, ConvE compress symbolic graphs into versatile vector spaces that mirror conceptual closeness.
Blazingly Fast Similarity Search
This vectorization enables seamless integration with highly optimized vector similarity search libraries like FAISS, Annoy. Billions of embeddings can be indexed for interactive low-latency queries based on semantic relevance rather than just matching keywords. Any input gets converted to a vector lookup key in real-time.
Instead of needing complex ranking functions, proximity between query vector and result vectors provides natural, reliable relevance. Typos and terminology variations have little effect. Related peripheral concepts also get retrieved thanks to the dense dimensionality capturing the richness of relationships.
Augmenting Neural Rankers
Knowledge graph powered vector search systems greatly augment pure text retrievers. They supply the structural knowledge neural networks lack while benefiting from their statistical generalizability. Combining both as ensembles covers each other’s weaknesses — improving retrieval recall and relevance.
Whether for search or generative AI, distilling symbolic knowledge graphs to fuse with subsymbolic systems via vector embeddings seems poised to fulfil the promises of hybrid AI. Math fuels understanding guided by meaning.
Part 3 — Boosting Language Models
- Hallucinations from poor context
- Injecting structured knowledge
- Grounding generation with retrieval
- Knowledge as the missing piece for reasoning
The Risk of Hallucination
The ability of large language models to generate coherent, human-like text from prompts makes them enticing. Yet their propensity to “hallucinate” — make up facts, stray from context and produce toxic outputs makes reliability challenging. They struggle to judiciously determine what knowledge needs grounding without external guidance.
Grounding with Structured Knowledge
Augmenting models with structured knowledge graphs helps mitigate this because they encode curated factual information. Rather than expect models to magically determine relevance, explicit semantic retrieval from knowledge graphs provides missing factual context to guide responses.
Scalable semantic search acts as a cueing mechanism between model and external knowledge store. For any generation state, related knowledge graph vectors inject factual reinforcement signals — reducing chances of drifting from reality.
Making Room for Reasoning
Another benefit arises from compositionality. Knowledge graph embeddings inherently learn to model logical constraints, types of relations and meaningfully operate in the vector space through supervised training objectives.
Their dimensional embeddings acquire specific interpretation — ordering, symmetry, hierarchy. This gets distilled into the language model’s own latent space when fused — allowing inheriting implicit comprehension of structural and logical semantics.
Beyond Pattern Recognition
The compositionality and interpretability of knowledge graph embeddings thus provide the missing ingredient for language models to move beyond mere pattern recognition into deeper reasoning, interpretation and judgement. Grounding open-ended neural generation with structured imagination paves path for more reliable intelligence.
Much innovation still needed in caching strategies, efficient indexes and seamless fusion techniques — but the case for coupling structured knowledge with neural representation learning is compelling. Blending their complementary strengths is poised to unlock versatile, trustworthy and transparent language AI.
Part 4— The Path Ahead
Beyond Text-Only Knowledge
While knowledge graphs currently focus on text-based factual knowledge, the future offers rich opportunities for multi-modal knowledge aggregation. Structured embeddings for images, video, audio and speech integrated into a unifying vector space could enable seamless transfer of inference across modalities.
Diverse reasoning
The compositionality of knowledge graph embeddings opens doors to capture diverse reasoning semantics:
Logical Reasoning: Representing logical predicates as geometric constraints allows modeling notions like symmetry, inversion, contradiction, negation, hierarchy. This equips models with interpretability and causality.
Probabilistic Reasoning: Rather than deterministic point vectors, embedding knowledge elements as probability density functions enables natively capturing uncertainty while supporting probabilistic logic operations.
Temporal Reasoning: Introducing time marker nodes with temporal relation types allows projection of entity embeddings to future states by analyzing traversal paths. This facilitates prediction and simulation.
Causal Reasoning: Adding causal links between entity nodes gives opportunity to disambiguate correlation vs causation during training for improved counterfactual inferences.
Analogical Reasoning: Explicit analogical proportions can be encoded into the space by tying entity difference vectors. Assessing new analogies then involves measuring divergence from these geometric analogical patterns.
The Broader Vision
Knowledge graph embeddings that capture diverse reasoning patterns provide the scaffolding for more grounded intelligence — filling gaps in common sense and judgement. Combined with open-ended neural creativity guided by structured imagination, we inch closer to versatile, trustworthy and transparent AI systems — integrating symbolic ethos with subsymbolic ingenuity at scale.
A simple example with Neo4j:
Here is an example of working with knowledge graph embeddings in Neo4j:
- Load Movie Plot Embeddings
First, we can load pre-computed movie plot embedding vectors from a CSV into Neo4j using LOAD CSV
:
LOAD CSV WITH HEADERS FROM "https://embeddings.csv" AS row
MATCH (m:Movie {movieId: row.movieId})
SET m.embedding = row.embedding
This will set a embedding
property on each Movie
node with the vector for that movie's plot.
2. Create Vector Index
Next, we create a vector index on the embeddings to enable similarity search:
CALL db.index.vector.createNodeIndex(
"moviePlotIndex", "Movie", "embedding", 512)
This indexes the 512-dimensional embedding
vectors to power semantic queries.
3. Similarity Search
We can now find movies with similar plot vectors using the index:
MATCH (m1:Movie {title:"Citizen Kane"})
CALL db.index.vector.queryNodes("moviePlotIndex", 6, m1.embedding)
YIELD node, score
RETURN node.title, score ORDER BY score DESC
This searches for the most similar plots to “Citizen Kane” based on the vector index.