Clojure Open-Source Code Metrics

I have assembled a file called massive.clj. It is 325k lines worth of concatenated open source Clojure code, every .clj from the top 50 most starred Clojure projects according to Github (excluding Clojure itself).

Let’s roll with some stats to understand some basic metrics of the most-used Clojure code, shall we?

Metric Count
File count 2,359
Line count 325,153
Lines of code* 275,690
Comment lines 23,266
SLOC 252,424
Top-level forms 15,550
Total amount of forms 477,000

In general, we have 1:11 code-to-comment ratio, average of 138 total lines per file (out of them, 107 lines are code). The longest file is 8,065 lines (a huge config of UIKit bindings in the Clojure-C project), the longest non-config file is clojure.typed‘s test.core at 4,573 lines and the longest non-config, non-test file is charts.clj in Incanter, a JFreeChart wrapper for Clojure’s most important data science project. At the shortest side, there are numerous 1-line files.

Files and lines aside, let’s focus on what’s ultimately most important, the code itself.

There are 15,550 top-line forms across the entire codebase; surprisingly few if one takes to account that it includes incredibly prolific and complex projects – Clojurescript, Compojure, core.logic, Typed Clojure, Incanter, Leiningen, LightTable, Midje, Pedestal, Quil and 40 more!

Out of these top-level forms:

Type of form Count What’s this?
defn 4,739 Function
deftest 2,048 Test
def 1,609 Constant
ns 978 Namespace
defn- 894 Private function
defmacro 892 Macro
defmethod 735 Case of a multimethod
clojure.typed/ann 543 Typed Clojure type hint
defrecord 154 Struct
Others 2,958 Protocols, multimethods etc

How often are docstrings used? 41% of public functions (1,962 functions) use them, 59% do not. For private functions, only 28% (253 functions) had docstrings. The average length of a docstring is 116 characters, with the shortest being only 10 and the longest being 4,039 characters long (whoa!).

Speaking of argument counts, most of the functions are usual 1- or 2-arity functions, with some notable exceptions of 0-arg functions or 7-, 8-, even 10-arity functions.

Argument count # of functions
0 288
1 1,888
2 1,198
3 472
4 162
5 87
6 21
7 12
8 2
10 1

Out of total 477,000 total internal forms, 157,760 are meaningful (are a function or a macro or a special form). Among them, top 100 most popular elements are:

Fn/macro/special form # of occurences
list 19626
quote 12720
seq 7629
concat 7529
let 5307
defn 5026
= 4710
is 4159
apply 2196
deftest 2050
if 2029
def 1784
fn 1552
str 1186
map 1160
when 1091
fn* 1077
ns 1014
defn- 928
defmacro 910
and 902
defmethod 884
-> 864
:require 808
assoc 678
ann 672
count 649
not 608
testing 569
cond 569
or 550
deref 531
do 509
println 498
== 481
first 478
recur 422
when-not 401
doseq 399
emitln 346
:use 344
is-clj 337
assert 336
is-tc-e 330
run* 320
* 306
if-let 303
update-in 291
:import 282
throw 282
nil? 268
reduce 267
+ 261
loop 259
->> 248
empty? 240
into 238
binding 229
emits 226
- 221
fd/interval 213
try 211
for 211
conj 209
instance? 208
catch 204
swap! 203
fresh 200
next 200
f 196
All 195
merge 193
contains? 191
inc 189
core-run 158
range 183
meta 182
nil 175
when-let 175
declare 174
var 172
set 168
every? 167
get 166
matrix 165
ret 164
defrecord 164
enqueue 158
defalias 157
rest 156
< 155
mapv 148
sel 146
name 146
/ 146
nom/tie 146
test?<- 144
HMap 143
defprotocol 143
is-tc-err 142

Do you know any other good statistics to run on this dataset? Tell me by email (zirkonit at gmail.com), or, better yet, fork the repository on github (https://github.com/zirkonit/clamjamfry) and run the stats yoursefl!

 
68
Kudos
 
68
Kudos