Geometric Information Engineering: Structural Conservatism, Manifold Constraints, & Knowledge Architectures

Jan 04, 2026

The translation layer between AI and information is geometry. A physicist once told me that machine learning’s biggest barrier is the tools we’re forced to use. We’re trying to understand the world using the math of lines and rectangles. But the world is curved, and at some point soon, we must confront that and adapt our methods accordingly. In the last 5 years, AI has embraced more complex geometries.

Now we must use geometry to unite what are seen as disparate fields: Information Engineering and AI Engineering. Through the lens of geometry, AI and information are just different representations of the same system(s). Traditional databases hold information in a simple geometry. That’s the BI paradigm. In the information paradigm, knowledge graphs enable more complex geometries, and there is information built into the structure.

This article is not what I would consider a novel work. I’m simply saying things that have already been stated somewhere else. The article is extremely derivative and almost a pure research review. The authors of the most recent DeepSeek paper begged for it in their conclusion. I know why they didn’t extend their own idea to follow this line of thought…it’s not 100% finished, and it’s really challenging to implement.

Still, it’s a valid structural direction and provable at a small scale. Features like Google or OpenAI’s Deep Research might already leverage similar mechanisms to the ones I’ll describe. This article isn’t an addition to the body of knowledge by any definition. I haven’t seen anything else explain this direction in AI in a single overview, so I am creating the summary and pointing out obvious connections.

This is also not how research is typically written. I’m going to tell you where all the gaps, assumptions, and challenges are. I’ll tell you where additional work is required and try my best to provide a starting point to get it done. This is how engineers wish research were written. It’s OK to be early and provide a partial solution framework. Just be transparent about it.

Open Issues - Warnings Before Claims

You’ll quickly notice the first elephant in the room: dynamism. Managing dynamism is a huge challenge for both information and AI engineering. That’s probably why the DeepSeek team didn’t advance in this direction. This approach scales the frequency of pre-training unless you structure information around stability. As the information structure scales, it will add overhead to inference. As the rate of change scales, everything gets more computationally expensive. We have a lot of work to do here.

The next elephant is one I call out frequently. The accuracy of the information structure is assumed, but I don’t present a validation mechanism as part of this article. That’s another gap. If the information has issues, those will cause cascading failures in model training and complex, multi-step inference. Any information structure without validation mechanisms provides insufficient guarantees to be useful. We have a lot of work to do here as well.

Not all of the claims in the mHC paper have been verified. There are questions about aspects like whether or not DeepSeek’s application of Sinkhorn-Knopp will hold as a universal projection. DeepSeek’s claims are supported by the team’s internal experiments, but have not been heavily reviewed and reproduced. I expect there to be issues discovered with the author’s claims. I’m violating my own 3-week rule because I know their approach is directionally sound. I am aware of the gaps, and DeepSeek’s team should be more transparent about them.

I’ll discuss “preserving semantics” and support it, but I can’t give experimental proof or mathematical guarantees as this approach scales. There are many edge cases, and I haven’t experimentally ruled them out. This framework should reduce hallucinations and improve reliability, but there are costs, limits, and trade-offs. I’m not trying to solve the problem of AGI or fix foundational models to be 100% accurate. I am also not providing experimental support because it would be misleading, as it is in most research. I present this as a framework only.

There is information in the geometry of information structures, and knowledge graphs represent a topology of semantic space. My assessments are supportable, but I am taking liberties with the language to give a metaphorical understanding to a nontechnical audience. I do this throughout so a broader audience can take away a better understanding of what I’m explaining. Not everyone reading this will rush out to implement it. If you see a metaphor, treat it as such.

I’m writing this to get the ball rolling, but I am a flawed messenger. I have NDAs that prevent me from going into greater depth in some areas, and I don’t have the resources to show you how to implement this at the scale DeepSeek is building on. The gaps of proving and supporting scale are huge, and I want to emphasize them before I go any further. This might be one of those dead ends that work for local AI and smaller domain-specific models, but fail to support anything beyond that, until we make another leap in computational power.

I’m giving this to the community, and I want you to run with it. Break it. Optimize it. Rearchitect it. Iterate in production vs. debating the theory. At a high level, we’re seeing macro-level geometric structure being leveraged in a new way to control the direction of training and inference. In that way, applying Knowledge Graphs as a geometric structure is just a variation on a theme. You could replace it with any topological constraint and potentially see significant gains. This should spark your thinking in new directions.

AI Usage

I used AI to simplify and clarify language throughout the article for readability, especially for nontechnical audiences. I simplified multiple sections to be shorter and more concise without losing depth (no weaving). I also used AI to find additional sources for further exploration and to find original sources instead of reviews or derivative works. All AI passages have been reviewed and heavily edited by humans, so mistakes are still all my fault.

Introduction

For the past decade, the dominant deep learning paradigm has been one of unconstrained scaling, increasing parameter counts, widening residual streams, and diversifying connectivity patterns to maximize expressivity. However, as architectures expand into Hyper-Connections (HC), this unconstrained plasticity has begun to compromise the stability guarantees that underpinned deep learning. Manifold-Constrained Hyper-Connections (mHC) is a return to structural conservatism, a design philosophy that prioritizes the preservation of signal identity and information boundaries over unbounded flexibility.

The principles of structural conservatism in mHC are not solely relevant to neural architecture. They are in alignment with Information Engineering and Knowledge Graph (KG) Engineering. I’ll hypothesize that the mathematical constraints applied in mHC, specifically the projection of connection matrices onto the Birkhoff polytope, can be amplified by the semantic topology of KGs. By integrating the geometric constraints of mHC with the semantic schemata of KGs, I explain a potential direction to build a unified framework: Geometric Information Engineering.

In this framework, KGs do more than serve as external databases for RAG. They act as geometric scaffolds that define the ‘safe manifolds’ (safe isn’t a mathematical guarantee in the context of this article) within which neural networks must operate. I’ll explore how KGs can provide the information topology necessary to guide information routing in Mixture-of-Experts (MoE) architectures, effectively transforming the opaque black box of neural mixing into a semantically coherent, structurally conservative process.

I’ll pull from signal propagation, manifold geometry, and information systems theory to explain one direction of travel toward foundation models that are more mathematically stable and semantically grounded than our current models.

Unconstrained Plasticity & The Return To Conservatism

The Erosion Of The Identity Mapping Property

How did we get here? To understand the necessity of structural conservatism, we must deconstruct the mechanism of failure in modern high-capacity networks. The foundational success of Deep Residual Networks (ResNets) was built on the identity mapping property. This architecture ensures that the gradient can decompose into a term that propagates directly through the skip connection, plus a term flowing through the residual block. In ideal conditions, this allows signals to propagate ‘losslessly’ (in this article, I won’t assume lossless strictly holds to my framework, especially at scale) across hundreds or thousands of layers, acting as a conservation mechanism for information.

Recent architectural innovations, exemplified by HC, sought to extend this paradigm by expanding the width of the residual stream and introducing complex, learnable connectivity patterns. In HC, the input is expanded into a multi-stream feature matrix, and a learnable matrix mixes these streams. While this design significantly increases topological complexity and expressivity without increasing FLOPs, it fundamentally compromises the identity mapping.

The composite mapping across multiple layers fails to preserve the global mean or norm of the features. In unconstrained HC networks, the Amax Gain Magnitude, a measure of signal amplification, can peak at values exceeding 3000. This represents a catastrophic divergence from the stable, unity-gain propagation required for effective training.

This can be understood as a failure of structural conservatism. In systems engineering, conservatism refers to design margins and constraints that prevent a system from entering unstable states, even under extreme load. By removing the constraint of the identity map and replacing it with an unconstrained linear map, HC architectures sacrifice the integrity of the signal carrier for the sake of short-term expressive gain. The result is training instability, loss spikes, and a fundamental inability to scale to the depths required for next-generation foundation models.

Defining Structural Conservatism In Neural Architectures

In the context of deep learning, structural conservatism is the architectural enforcement of invariant properties that preserve the statistical and information integrity of the signal during propagation. It is the digital version of conservation laws in physics. Just as a physical system must conserve energy and momentum to remain stable, a neural network must conserve the mean and bound the norm of its feature representations to remain trainable.

This conservatism is focused on preserving traceability and identity. In a structurally conservative network, the output at layer L is a predictable, bounded transformation of the input at layer l. The information is transformed, but its essential magnitude and existence are ‘conserved’ (mostly). This mirrors the principle of homeostasis in biological systems, where internal conditions are regulated within narrow limits despite external fluctuations.

The return to structural conservatism in mHC is a correction of the architectural drift that occurred when researchers prioritized plasticity (the ability to learn any function) over stability (the ability to learn reliably). Enterprise and consumer use cases require stability. The goal of AGI or ASI requires near infinite plasticity. We need a more pragmatic, applied framework, or large foundational models will have little utility in the real world.

The Manifold-Constrained Solution

The solution proposed by mHC is to project the residual connection space onto a specific geometric manifold: the Birkhoff polytope, which consists of the set of doubly stochastic matrices. A doubly stochastic matrix is a square matrix P where

\(P_{ij}\geq0\ for\ all\ i,j,\)

and both the rows and columns sum to

\(unity (\sum_{i} P_{ij}=1,\sum_{j} P_{ij}=1)\)

In the DeepSeek paper, this constraint is enforced via the Sinkhorn-Knopp algorithm, an iterative scaling procedure that projects any non-negative matrix (guaranteeing this constraint is an open issue) onto the Birkhoff polytope. This projection restores structural conservatism through two rigorous mechanisms:

Norm Preservation: The spectral norm of any doubly stochastic matrix is bounded by 1. The mixing operator is non-expansive, which can reduce amplification across repeated mixing steps. In dynamical systems terms, the residual stream becomes a Lyapunov-stable system where perturbations do not grow exponentially with depth (this isn’t a rigorous guarantee).

Convex Combination: The operation

\(\mathcal{H}_\ell^{\mathcal{res}}x_l \)

becomes a convex combination of the input features. This conserves the feature mean across the parallel streams. If the input streams have a certain average activation, the mixed output will maintain that same average across streams (again, with partial guarantees).

This mathematical formalization of conservatism provides the stable foundation to build more complex, ‘semantically aware’ (using the term aware loosely) or information-aware structures using KGs. I cover this to point out the gaps in the DeepSeek paper that extend to my framework as well. They didn’t go as far as making these claims, likely because they’d have to reveal the gaps.

Information Engineering: The Architecture Of Integrity

To fully appreciate the implications of mHC, we must view it through the lens of Information Engineering (IE). IE is concerned with the generation, distribution, analysis, and management of information in systems. Central to IE are the concepts of data integrity, schema consistency, and traceability, concepts that map well onto the problems of signal propagation in deep networks.

The Identity Map Pattern & Data Integrity

In software architecture and database management, the Identity Map is a fundamental design pattern used to ensure data integrity. Its primary purpose is to guarantee that every unique database record is loaded into memory only once per session. If a system were to load the same user record into two different objects, modifications to one would not be reflected in the other, leading to a split-brain inconsistency.

This is conceptually similar to the Identity Mapping in ResNets.

In Software: The Identity Map ensures that the truth of the entity or its state in the database is preserved across multiple processing modules.

In ResNets: The residual connection ensures that the ‘truth’ (again, not with mathematical guarantees at the information level) of the signal or the features learned by previous layers is preserved and available to deeper layers.

When mHC enforces doubly stochastic constraints, it is essentially implementing a Distributed Identity Map. In a multi-stream architecture, the entity or signal is split across multiple channels. Without constraints, these channels diverge, which represents a form of data corruption. The Birkhoff polytope constraint ensures that the sum of these parts closely reconstructs the whole. It allows for the identity of the signal to be maintained across the distributed transaction of the layer mixing.

I cover this to show that even though we don’t specifically treat it as such, we do a significant amount of IE as part of feature engineering and model training. The model itself manages information flows or streams, but the mechanisms are often opaque and not well managed.

Traceability & Lossless Transmission

A core tenet of modern Information Engineering is traceability, the ability to track the lineage and transformation of data elements throughout a complex lifecycle. As you can see in the link, there are multiple approaches being considered to support this tenet. In cross-enterprise data flows, or supply chain management, structural conservatism is the requirement for persistent identity mapping between original data sources and derivative models. If a transformation is opaque, irreversible, or lossy, traceability is broken, and the system loses auditability.

This concept parallels the ideal of ‘lossless’ (more accurately, minimizing the loss) transmission in communication theory and residual networks.

Classical Communications: Shannon’s theory focuses on the lossless transmission of bits.

Residual Networks: Skip connections facilitate the ‘lossless’ flow of gradients during backpropagation, preventing most information loss due to vanishing gradients.

Semantic Communication: The emerging field of Semantic Communication extends this to the ‘lossless’ (again, more accurately, the minimization of loss) transmission of meaning. In SemCom, the goal is to ensure that the semantic content received is identical to the content sent, even if the raw data is compressed or altered.

mHC acts as a structure-preserving channel for this transmission. By constraining the transformation matrix to be doubly stochastic, mHC aims to ensure that the information flow is a permutation-like operation. It rearranges and mixes information but preserves much of the total semantics. This makes the network’s internal operations more traceable and less prone to hallucinating signal magnitude.

Common Data Models As Manifold Constraints

In the integration of Computer-Aided Design (CAD) and Computer-Aided Engineering (CAE), a Common Data Model (CDM) acts as a central repository that maintains associative dependencies between diverse models.

The Problem: CAD models (geometry) and CAE models (physics simulation) often diverge. A change in geometry might break the physics mesh, leading to simulation failure.

The Solution: The CDM imposes a schema consistency that prevents the geometry from entering states incompatible with the physics analysis.

This offers a powerful analogy for neural network design.

The Neural Manifold Is The CDM: The latent space of the neural network is the common data model where features reside.

The Constraint Is The Schema: The Birkhoff polytope constraint in mHC acts as the schema validation. It defines the valid shape of the connectivity matrix. Any gradient update that pushes the matrix off this manifold is projected back via Sinkhorn-Knopp, just as a CDM would reject a geometric operation that violates topological validity.

This constraint-based design is the essence of structural conservatism in engineering: the system is architected to minimize invalid or unstable states, regardless of the local optimization objective. It prioritizes global consistency over local optimization speed. We are using macro-level architecture to align micro-level behaviors.

Knowledge Graphs As Geometric Manifolds

The key insight to take away is that KGs are not just data structures; metaphorically, they define the intrinsic topology of the semantic space. In a Geometric Information Engineering framework, the KG provides the map, and the mHC architecture provides the vehicle that adheres to the road.

Embedding Knowledge Into Geometry

Traditional Knowledge Graph Embedding (KGE) methods map entities to vectors in Euclidean space. However, Euclidean geometry is often mismatched with the structure of complex knowledge.

Hierarchies & Hyperbolic Space: Real-world knowledge is often hierarchical. In Euclidean space, the volume grows polynomially, but the number of nodes in a tree grows exponentially. This forces distortion. Hyperbolic manifolds are quasi-isometric and behaviorally similar enough to trees to be useful. Embedding hierarchical KGs into hyperbolic space preserves their structure with low distortion.

Cycles & Spherical Space: Many directional relationships are cyclic or periodic: seasonal events, biological cycles, etc. Spherical manifolds are ideal for representing these patterns as complexity and relationship dimensionality increase (strictly, the mapping of cyclic variables is directly related to circular/taurus manifolds, but the reference to a sphere is intentional).

Product Manifolds: Complex KGs, like UMLS or Wikidata, contain both hierarchies and cycles. Advanced KGE methods utilize product manifolds to capture these diverse topologies simultaneously.

KGs As Manifold Constraints

A KG can function as a structural constraint on the neural manifold. Instead of allowing the neural network to learn an arbitrary latent space, we can force the latent space to conform to the topology of the KG.

Manifold Regularization: Techniques like Manifold Regularization explicitly penalize neural representations that violate the proximity relationships defined in the graph. If node A and node B are connected in the KG, their neural embeddings must be close on the manifold.

Ricci Curvature Regularization: Recent work leverages Ricci curvature, a geometric measure of how a manifold deviates from being flat, to regularize Graph Neural Networks. By coupling the loss with the local curvature of the graph, methods like RicciKGE allow the entity embeddings to co-evolve with the underlying manifold geometry. This enables the neural manifold to act as a dynamic surface that reflects the shape of the knowledge.

The ‘Safe’ Manifold Hypothesis

In high-stakes domains like robotics or medical diagnosis, the concept of a safe manifold becomes desirable. A safe manifold is a subspace of the neural state space that corresponds to valid, safe, and semantically consistent states.

KG-Defined Boundaries: A KG can define the boundaries of this safe manifold. For example, in a medical KG, the relation (Drug A, interacts_with, Drug B) implies a constraint: the system should never recommend both drugs simultaneously.

Constraint Projection: A neural planner can be constrained to project its trajectory onto the safe manifold defined by the KG. Just as mHC projects gradients onto the Birkhoff polytope to ensure numerical stability, a Knowledge-Constrained Network projects decisions onto the KG-manifold to align with semantic validity.

This establishes a direct overlap: mHC provides the mechanism for projection-based stability, while the KG provides the target manifold (metaphorically, the semantic topology) onto which the projection occurs.

Geometric Information Engineering

I propose the term Geometric Information Engineering (we need something better) to describe the synthesis of mHC’s structural conservatism with the semantic topology of KGs. This convergence offers a solution to one of the most persistent challenges in ‘large’ (again, I can’t support scale as a certainty) models: the lack of interpretable routing and reasoning.

Semantic Routing In MoE

MoE architectures are the primary beneficiaries of mHC. In an MoE, a router network decides which expert sub-network processes which token. Currently, routers are often black boxes trained via simple SoftMax gating, leading to load imbalances and semantic collapse, routing similar tokens to disparate experts.

The Doubly Stochastic Bridge

By applying the doubly stochastic constraint via Sinkhorn-Knopp to the routing matrix, we can fundamentally transform this process.

Load Balancing: A doubly stochastic routing matrix makes it much more likely that every token is assigned to an expert (row sum = 1) and that every expert receives an equal amount of work (column sum = 1). This significantly improves computational conservatism; experts are less likely to be overloaded or starved.

KG-Guided Initialization: Instead of initializing the router randomly, we can inform it with a Semantic Topology derived from a KG. We can mask the routing matrix such that tokens corresponding to specific concepts, like medical terms, are only allowed to route to experts specialized in that domain.

Mechanism: This can be implemented as a Masked Sinkhorn operation:

\(\mathcal{P}\left(\mathcal{H}\right)=\mathrm{Sinkhorn-Knopp}\left(\mathcal{H}\odot M_{KG}\right)\)

Where is a binary or weighted mask derived from the KG ontology (refer to the mHC paper for fuller coverage of Sinkhorn-Knopp). This guides the router to find the optimal balanced assignment that respects the semantic constraints.

Graph-Constrained Reasoning (GCR)

At the inference level, the principles of structural conservatism apply to the generation process itself. The GCR framework integrates a KG directly into the LLM decoding loop.

The KG-Trie: GCR constructs a prefix tree (Trie) of all valid reasoning paths in the KG.

Constrained Decoding: During token generation, the model’s output distribution is masked. It is only allowed to generate tokens that correspond to valid edges or nodes in the KG-Trie.

Result: This reduces hallucinations by enforcing foundational semantic consistency (it’s a valid open point to question whether or not this scales beyond foundational semantics). The model is less likely to wander off the manifold of valid knowledge. It is structurally constrained to lose less of the truth defined by the KG.

This is the macroscopic equivalent of mHC.

mHC: Constrains internal features to the Birkhoff polytope to preserve signal norms.

GCR: Constrains output tokens to the KG-Trie to preserve semantic validity.

Both are applications of Geometric Information Engineering: using topological constraints to enforce lines of learning (but not guarantee them).

Semantic-Geometric Consistency

The ultimate goal is Semantic-Geometric Consistency. This principle asserts that the geometric proximity of two points in the neural latent space should strictly correspond to their semantic similarity in the KG.

Current Failure: In standard LLMs, rare concepts often have unstable embeddings that drift far from their semantic neighbors, leading to generation failures.

The Fix: By enforcing manifold constraints (mHC) and regularizing with KGs (Ricci curvature, hyperbolic embeddings), we can guide the neural space to be quasi-isometric to the knowledge space. This makes it much more likely that the model’s ‘mental map’ of concepts will more closely mirror the KG of the domain.

Practical Applications & Potential Case Studies

The theoretical framework of Geometric Information Engineering has potential applications across several fields.

Robotics: Manifold-Constrained Planning

In robotics, the concept of planning on a manifold is well-established. Robots must navigate configurations that satisfy kinematic constraints (joints cannot hyperextend) and environmental constraints (do not hit walls).

Neural Planners: New learning-based planners implicitly learn this constraint manifold. By projecting the robot’s state onto the manifold defined by a Robot Knowledge Graph, we ensure that generated actions are not just physically possible but semantically valid.

Example: A pick-up cup action is constrained by the semantic relation (Cup, is_on, Table). The planner only explores trajectories on the manifold slice consistent with this relation.

Bioinformatics: Structural Conservatism In Protein Design

In bioinformatics, structural conservatism refers to the biological fact that protein 3D structures are more conserved than their amino acid sequences.

GraphBind: Models like GraphBind use hierarchical Graph Neural Networks to predict nucleic acid binding sites by explicitly capturing this structural conservatism.

Application Of mHC: Applying mHC to protein language models could stabilize the learning of long-range residue interactions. By treating the contact map as a doubly stochastic matrix, the model is much more likely to respect the physical constraint that a residue can only have a limited number of contacts, potentially limiting hallucinated bonds. (This is a stretch requiring a lot more work and intermediate steps. I’m presenting it to get AI engineers thinking about where this would fit into a larger pipeline and solution stack, but also to call out that there’s more work required from an engineering standpoint.)

Semantic Communication: The Low Loss Meaning Channel

In next-generation wireless networks (6G), Semantic Communication aims to transmit meaning rather than raw data to save bandwidth.

The Challenge: Ensuring that the meaning decoded is identical to the meaning encoded (semantically lossless).

The Solution: KGs serve as the shared codebook. The transmitter encodes data into a semantic symbol or a node in the KG. The receiver uses the shared KG to decode it. mHC architectures in the encoder/decoder ensure that the semantic signal does not significantly degrade during the neural processing steps. The doubly stochastic constraint minimizes semantic distortion by preserving the information geometry of the message.

The Impact Market of Neural Signals

The return to structural conservatism suggests a shift in how we view neural computation: as a resource-constrained economy.

The Impact Market Analogy

The Impact Market proposal for scientific publishing seeks to align prestige with verifiable impact using bounded tokens. We can view neural activations as tokens of value.

Unconstrained Economy (HC): In an unconstrained network, inflation occurs. Signals explode, gradients vanish, and the value of a feature becomes noisy and unreliable.

Conservative Economy (mHC): In a structurally conservative network, the value is conserved. The network must invest its signal energy budget wisely, routing it only to the most relevant experts. The doubly stochastic constraint acts as a central bank, enforcing a monetary policy that reduces the risks of hyperinflation.

Toward Safe Foundation Models Via KGs

As AI permeates critical infrastructure, safety can be evaluated as a geometric problem. Alignment is essentially the task of attempting to confine the model’s trajectory to a specific safe subspace of potential outputs.

KGs As Maps: KGs provide the map of this safe territory: the valid relations, ethical constraints, and physical laws.

Constraints As Steering: Manifold constraints provide the vehicle dynamics that prevent the model from steering off the map.

Conclusion: The combination is a blueprint for the next generation of more robust, trustworthy AI. It moves us from probabilistic safety to safety structures, making it less likely for the model to violate the conservation laws of the domain.

Conclusion

The return to structural conservatism embodied by DeepSeek’s mHC is a necessary step for more reliable deep learning. By projecting the chaotic potential of hyper-connected networks onto the ordered manifold of the Birkhoff polytope, mHC restores much of the identity mapping property, the fundamental conservation law that makes deep learning possible.

When viewed through the lens of Information Engineering, this architectural choice mirrors the established principles of data integrity, traceability, and schema consistency. KGs amplify this paradigm by providing the semantic topology, the schema that defines the valid geometric manifolds for learning.

The synthesis of these fields, Geometric Information Engineering, offers an example of a unified path forward. It envisions neural networks that approach:

Mathematical Stability: Constrained by doubly stochastic manifolds to prevent numerical collapse.

Semantic Grounding: Constrained by KG topologies to prevent semantic hallucination.

Structural Conservation: Preserving signal, meaning, and integrity, like a bridge preserves its load paths or a database preserves its records.

In this framework, the geometry of the model and the knowledge of the domain are combined into a single, rigorous structure.

References, Resources, Further Reading, & Works Cited

Identity Mappings in Deep Residual Networks https://arxiv.org/abs/1603.05027

Understanding Deep Residual Networks https://shuzhanfan.github.io/2018/11/ResNet/

Reading Notes: Identity Mappings in Deep Residual Networks https://lzhangstat.medium.com/reading-notes-identity-mappings-in-deep-residual-networks-e1980b3aa753

mHC Manifold Constrained Hyper Connections https://www.arxiv.org/abs/2512.24880

Efficient Manifold-Constrained Neural ODE for High-Dimensional Datasets https://arxiv.org/html/2510.04138v1

Birkhoff polytope https://en.wikipedia.org/wiki/Birkhoff_polytope

Implementing the Sinkhorn-Knopp Algorithm in NumPy https://www.statology.org/implementing-the-sinkhorn-knopp-algorithm-in-numpy/

Understanding Identity Map: A Comprehensive Guide to Its Applications https://www.graphapp.ai/blog/understanding-identity-map-a-comprehensive-guide-to-its-applications

Data Component Method Based on Dual-Factor Ownership Identification with Multimodal Feature Fusion https://pmc.ncbi.nlm.nih.gov/articles/PMC12610704/

Semantic Communication Networks Projects https://phdsolutions.org/blog/semantic-communication-networks-projects

What is Residual Connection? https://towardsdatascience.com/what-is-residual-connection-efb07cab0d55/

Variational Source-Channel Coding for Semantic Communication https://arxiv.org/html/2410.08222v3

Object-Attribute-Relation Representation-Based Video Semantic Communication https://ieeexplore.ieee.org/iel8/49/11039749/10974507.pdf

Parametric CAD/CAE integration using a common data model https://www.researchgate.net/publication/222828790_Parametric_CADCAE_integration_using_a_common_data_model

Parametric CAD/CAE integration using a common data model - University of Alberta https://sites.ualberta.ca/~yongshen/index_files/[Ma%20Y%20S%202011%20JMS].pdf

Graph Edit Distance with General Costs Using Neural Set Divergence https://proceedings.neurips.cc/paper_files/paper/2024/file/860e5b214c842eaedaa6b4026ee91aac-Paper-Conference.pdf

Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances https://www.researchgate.net/publication/237053780_Sinkhorn_Distances_Lightspeed_Computation_of_Optimal_Transportation_Distances

Sinkhorn-Knopp-Style Algorithm https://www.emergentmind.com/topics/sinkhorn-knopp-style-algorithm

Towards General Geometries for Embedding Knowledge Graphs https://ml3.leuphana.de/publications/icml_gram24.pdf

A Hyperbolic-to-Hyperbolic Graph Convolutional Network https://openaccess.thecvf.com/content/CVPR2021/papers/Dai_A_Hyperbolic-to-Hyperbolic_Graph_Convolutional_Network_CVPR_2021_paper.pdf

GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes https://academic.oup.com/bioinformatics/article/41/4/btaf160/8111648

SKGE: Spherical Knowledge Graph Embedding with Geometric Regularization https://arxiv.org/abs/2511.02460

Local-Curvature-Aware Knowledge Graph Embedding: An Extended Ricci Flow Approach https://arxiv.org/html/2512.07332v1

Motif-Aware Riemannian Graph Neural Network with Generative-Contrastive Learning https://ojs.aaai.org/index.php/AAAI/article/view/28754/29450

Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples https://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf

GALOPA: Graph Transport Learning with Optimal Plan Alignment https://proceedings.neurips.cc/paper_files/paper/2023/file/1d35af80e775e342f4cd3792e4405837-Paper-Conference.pdf

Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control https://arxiv.org/pdf/2506.19277

AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer-wise Pooled Representations https://aclanthology.org/2025.emnlp-main.145.pdf

Fast Kinodynamic Planning on the Constraint Manifold With Deep Neural Networks https://ieeexplore.ieee.org/iel7/8860/4359257/10292912.pdf

Understanding Mixture of Experts (MoE) Neural Networks https://intuitionlabs.ai/articles/mixture-of-experts-moe-models

Selective Sinkhorn Routing for Improved Sparse Mixture of Experts https://arxiv.org/html/2511.08972v1

GraphMoE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism https://arxiv.org/html/2501.07890v1

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures https://arxiv.org/html/2510.16411v1

Fine-Tuning Graph Neural Networks via Graph Topology Induced Optimal Transport https://www.ijcai.org/proceedings/2022/0518.pdf

MOT: Masked Optimal Transport for Partial Domain Adaptation https://www.youweiluo.top/Papers/MOT_CVPR2023.pdf

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models https://openreview.net/forum?id=6embY8aclt

Graph-Constrained Reasoning: A Practical Leap for Trustworthy, KG-Grounded LLMs https://medium.com/@yu-joshua/graph-constrained-reasoning-a-practical-leap-for-trustworthy-kg-grounded-llms-04efd8711e5e

Semantic-Geometric Consistency in AI https://www.emergentmind.com/topics/semantic-geometric-consistency

Rare Text Semantics Were Always There in Your Diffusion Transformer https://www.researchgate.net/publication/396250596_Rare_Text_Semantics_Were_Always_There_in_Your_Diffusion_Transformer

Distance-Based Classification with Lipschitz Functions https://www.jmlr.org/papers/volume5/luxburg04b/luxburg04b.pdf

Interpreting Behaviors and Geometric Constraints as Knowledge Graphs for Robot Manipulation Control https://arxiv.org/html/2310.03932v2

MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model https://pmc.ncbi.nlm.nih.gov/articles/PMC12496013/

MegSite: an accurate nucleic acid-binding residue prediction method based on multimodal protein language model https://academic.oup.com/bib/article/26/5/bbaf524/8273835

RNAsmc: An integrated tool for comparing RNA secondary structure and evaluating allosteric effects https://pmc.ncbi.nlm.nih.gov/articles/PMC9876829/

Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images https://pmc.ncbi.nlm.nih.gov/articles/PMC12385560/

The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing https://arxiv.org/html/2512.14104v1

Sinkformers: Transformers with Doubly Stochastic Attention https://proceedings.mlr.press/v151/sander22a/sander22a.pdf

Graph Similarity Computation via Interpretable Neural Node Alignment https://arxiv.org/html/2412.12185

Discussion about this post

Ready for more?