Self-citation rates a) in the last five years and b) since 2000. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.

Self-citation rates in 2016-2020 for First, Last, and Any Authors broken down by field (Neurology, Neuroscience, Psychiatry).

Average self-citation rates for each academic age in years 2016-2020. a) Self-citation rate vs. academic age for both First and Last Authors. Shaded regions show 95% confidence intervals obtained via bootstrap resampling. b) Comparison of self-citation rates by academic age for First and Last Authors. For a given academic age, a single point is plotted as (x=First Author self-citation rate for authors of academic age a, y=Last Author self-citation rate for authors of academic age a). The dashed line represents the y=x line, and the coloring of the points from dark to light represents increasing academic age.

Self-citation rates by country for a) first and b) Last Authors from 2016-2020. Only countries with >50 papers were included in the analysis. Country was determined by the affiliation of the author.

Self-citation rates by topic for First, Last, and Any Authors. Topics were determined by Latent Dirichlet Allocation. Confidence intervals of the average self-citation rate are shown based on 1000 iterations of bootstrap resampling.

Gender disparities in authorship and self-citation. a) Proportion of papers written by men and women First and Last Authors since 2000. b) Average self-citation rates for men and women First and Last Authors. c) Ratio of average self-citation rates of men to women for First and Last Authors. d) Self-citation rates by academic age for men and women authors, where the dashed line represents men and the solid line women. e) Ratio of self-citation rates of men to women by academic age. f) Number of papers by academic age for men and women, where the dashed line represents men and the solid line women. g) Ratio of average number of papers of men to women by academic age. In all subplots, 95% confidence intervals of the mean were calculated with 1000 iterations of bootstrap resampling.

Odds ratios of coefficients in logistic regression models of self-citation. The models included 1) citing/cited pairs models, where each pair was modeled with a binary outcome as to whether that is a self-citation and 2) highly self-citing models, where articles were binarized for meeting a threshold of high self-citation (at least 25%). For both models, we included one column that has a [gender x academic age] interaction term, and another column that does not include this term. The inverse hyperbolic sine transform was used in place of the log transform for positively skewed distributions to account for many of the distributions having a large number of zero-values. *P<0.05, **P<1e-5, ***P<1e-10

All journals included in this analysis by field, sorted alphabetically.

Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring).

Percentiles of self-citation rates in articles from 2016-2020.

Temporal trends in First Author, Last Author, and Any Author self-citation rates from 2000-2020 in Neurology, Neuroscience, and Psychiatry papers. Shaded regions show 95% confidence intervals calculated with bootstrap resampling.

Correlations between year and self-citation rate and corresponding slopes by field.

First Author and Last Author self-citation rates by affiliation country of the author for papers from 2016-2020. 95% confidence intervals obtained via bootstrap resampling are included in parentheses. Only countries with at least 50 papers were included in the analysis.

Self-citation rates by number of papers for women and men. a) Self-citation rates in bins grouped by number of previous papers. Error bars reflect 95% confidence intervals obtained with bootstrap resampling. Significant differences (permutation test, corrected P<0.05) between women and men are signified by an asterisk. b) Moving average (window size=5) of self-citation rates for each number of previous papers. In red, early-career self-citation rates are shown.

LDA perplexity on training and validation data for a different number of topics. The lowest validation perplexity was for seven topics.

Topic word clouds for 13 topics. These are the most common words appearing in each of our LDA model topics. Based on the word clouds, we assigned overall themes, or topic names.

Topic word clouds for seven topics. These are the most common words appearing in each of our LDA model topics. Based on the word clouds, we assigned overall themes, or topic names.

a) First Author, b) Last Author, and c) Any Author self-citation rates for seven topics.

Trajectory of the odds of self-citation of men compared to women after including interaction terms for a) academic age and the citing/cited model, and b) number of previous papers and the citing/cited model, c) academic age and the highly self-citing model, and d) number of previous papers and the highly self-citing model.

Single author self-citation rates for Dustin Scheinost. a) Histogram of Scheinost-Scheinost self-citation rates, which were computed as the proportion of references with Scheinost as an author across every paper. b) Scheinost-Scheinost self-citation rate over time. c) Any Author self-citation rates for all papers with Scheinost as an author.

Comparison of self-citation rates in the entire field of Neuroscience and the journal Nature Neuroscience.

P values for all 40 comparisons performed in this study. P values are corrected for multiple comparisons with the Bonferroni correction. For P values determined by permutation testing, 10,000 permutations were used. After correction, this means that a point more extreme than any in the null distribution would have P<0.004. Significant values (Pcorrected<0.05) are marked with an asterisk in the “Finding” column.