文章摘要
龙小宁,张帆,易巍.创新知识溢出的测度与检验——基于机器学习生成专利文本相似度的证据[J].数量经济技术经济研究,2026,(2):54-79
创新知识溢出的测度与检验——基于机器学习生成专利文本相似度的证据
The Measurement and Test of Knowledge Spillovers:Evidence from Generating Patent Text Similarity Based on Machine Learning
  
DOI:
中文关键词: 知识溢出  专利文本相似度  专利引用  高速铁路
英文关键词: Knowledge Spillovers  Patent Text Similarity  Patent Citations  High-speed Rail
基金项目:
作者单位
龙小宁 厦门大学知识产权研究院、厦门大学一带一路研究院 
张帆 浙江万里学院物流与电子商务学院 
易巍 集美大学财经学院 
中文摘要:
      知识溢出是科技创新发挥辐射带动作用,推动产业结构转型,建设现代化产业体系的关键。然而,知识溢出的测度难题长期存在,成为学界研究焦点。本文首先详细梳理了中国的专利制度安排,指出了现有研究中使用专利引文测度知识溢出所面临的挑战及其原因。为破解知识溢出的测度难题,本文基于机器学习模型,利用专利全文文本生成专利文本向量,并构建专利文本相似度指标。在此基础上,本文使用1985~2023年中国发明专利样本,结合知识产生的内在逻辑,提出了利用专利文本相似度衡量知识溢出的思路,并构造了表征城市间知识溢出的新指标。最后,借助高铁开通促进知识溢出的研究共识,本文运用双重差分方法对新指标的有效性进行了严格的实证检验,结果表明,专利文本相似度作为知识溢出衡量指标是合理有效的。本研究为深入理解中国知识溢出现状提供了新的测度工具和实证经验,有助于创新驱动发展战略的有效实施。
英文摘要:
      Innovation is the fundamental engine of sustained economic growth, and knowledge spillovers serve not only as the primary channel through which innovation exerts its impact but also as a vital source of future technological advancement. As China moves from high-speed growth to high-quality development, technological innovation has become a decisive driver. However, a “productivity paradox” has emerged: both China and advanced economies have experienced declining total factor productivity (TFP). Increasingly, this trend is attributed to a slowdown in knowledge diffusion. The “Proposal for the 15th Five-Year Plan,” recently adopted by the Central Committee, explicitly identifies “steady improvement of TFP” as a strategic objective. Therefore, accurately measuring knowledge spillovers within the broader framework of industrial modernization is both an academic imperative and of pressing practical relevance.However, measuring knowledge spillovers in China encounters distinctive institutional constraints. Although patent citations constitute the global standard for tracing knowledge flows, this study reveals systemic limitations in their applicability in the Chinese context. A systematic examination of China’s patent legislation and administrative rules reveals that while applicants are required to disclose prior art, non-compliance carries no legal penalties. This absence of enforcement generates a clear conflict of interest, incentivizing applicants to deliberately withhold citations. Thus, citation records are dominated by “examiner citations,” which reflect administrative searches conducted by patent authorities rather than an inventor’s genuine learning or knowledge absorption process. This institutional distortion undermines the reliability of citation-based measures, rendering them inadequate for capturing authentic knowledge spillovers in China.This study tackles the measurement challenge by introducing a novel artificial intelligence (AI)-based approach. Using the full text of Chinese invention patents from 1985 to 2023, we adopt high-dimensional semantic vectors trained by the Google Patents team. By computing cosine similarity across these vectors, we develop a new indicator of inter-city knowledge spillovers. Leveraging a panel of Chinese cities from 2002 to 2015, we exploit the staggered opening of high-speed rail (HSR) as an exogenous shock to validate this indicator. The difference-in-differences (DID) estimates demonstrate that the text-similarity measure reliably captures the causal effect of HSR on knowledge spillovers, revealing a statistically significant and robust positive impact.Mechanism analysis further corroborates the indicator by identifying specific transmission channels. Evidence reveals that spillovers primarily arise from intensified face-to-face academic interactions and expanded cross-regional corporate investments. The indicator’s ability to detect these micro-level mechanisms-facilitated by lower travel costs due to HSR-proves it reflects authentic economic behavior rather than statistical numbers. Additionally, extensive heterogeneity analyses assess the indicator’s validity. The results reveal that spillover effects detected by the metric vary systematically across cities with different characteristics, which is fully consistent with theoretical expectations. These robust findings affirm that the text-based similarity measure is a sound tool for quantifying knowledge flows in the Chinese context. This study makes three contributions to the literature. First, it solves the measurement challenge by identifying institutional roots of citation data failure and introducing semantic vector analysis to circumvent systemic biases. Second, it rigorously validates the new metric using causal inference, offering a verified tool for research. Third, it achieves a methodological breakthrough as a novel study to apply Google Patent vectors to causal inference in a developing country, extending their use from control group construction to direct measurement.Based on these findings, we propose the following policy recommendations: promote the “intelligent transformation” of technology evaluation by incorporating AI-based semantic analysis to identify disruptive innovations; deepen the development of patent data as a production factor by cleaning and structuring unstructured texts; and optimize regional planning by prioritizing “innovation connectivity” in transport infrastructure.
查看全文       相关附件:   下载数据代码附录