Clojure Open-Source Code Metrics
I have assembled a file called massive.clj
. It is 325k lines worth of concatenated open source Clojure code, every .clj
from the top 50 most starred Clojure projects according to Github (excluding Clojure itself).
Let’s roll with some stats to understand some basic metrics of the most-used Clojure code, shall we?
Metric | Count |
---|---|
File count | 2,359 |
Line count | 325,153 |
Lines of code* | 275,690 |
Comment lines | 23,266 |
SLOC | 252,424 |
Top-level forms | 15,550 |
Total amount of forms | 477,000 |
- without whitespace, without parentheses-only lines
In general, we have 1:11 code-to-comment ratio, average of 138 total lines per file (out of them, 107 lines are code). The longest file is 8,065 lines (a huge config of UIKit bindings in the Clojure-C project), the longest non-config file is clojure.typed
‘s test.core at 4,573 lines and the longest non-config, non-test file is charts.clj in Incanter, a JFreeChart wrapper for Clojure’s most important data science project. At the shortest side, there are numerous 1-line files.
Files and lines aside, let’s focus on what’s ultimately most important, the code itself.
There are 15,550 top-line forms across the entire codebase; surprisingly few if one takes to account that it includes incredibly prolific and complex projects – Clojurescript, Compojure, core.logic, Typed Clojure, Incanter, Leiningen, LightTable, Midje, Pedestal, Quil and 40 more!
Out of these top-level forms:
Type of form | Count | What’s this? |
---|---|---|
defn |
4,739 | Function |
deftest |
2,048 | Test |
def |
1,609 | Constant |
ns |
978 | Namespace |
defn- |
894 | Private function |
defmacro |
892 | Macro |
defmethod |
735 | Case of a multimethod |
clojure.typed/ann |
543 | Typed Clojure type hint |
defrecord |
154 | Struct |
Others | 2,958 | Protocols, multimethods etc |
How often are docstrings used? 41% of public functions (1,962 functions) use them, 59% do not. For private functions, only 28% (253 functions) had docstrings. The average length of a docstring is 116 characters, with the shortest being only 10 and the longest being 4,039 characters long (whoa!).
Speaking of argument counts, most of the functions are usual 1- or 2-arity functions, with some notable exceptions of 0-arg functions or 7-, 8-, even 10-arity functions.
Argument count | # of functions |
---|---|
0 | 288 |
1 | 1,888 |
2 | 1,198 |
3 | 472 |
4 | 162 |
5 | 87 |
6 | 21 |
7 | 12 |
8 | 2 |
10 | 1 |
Out of total 477,000 total internal forms, 157,760 are meaningful (are a function or a macro or a special form). Among them, top 100 most popular elements are:
Fn/macro/special form | # of occurences |
---|---|
list |
19626 |
quote |
12720 |
seq |
7629 |
concat |
7529 |
let |
5307 |
defn |
5026 |
= |
4710 |
is |
4159 |
apply |
2196 |
deftest |
2050 |
if |
2029 |
def |
1784 |
fn |
1552 |
str |
1186 |
map |
1160 |
when |
1091 |
fn* |
1077 |
ns |
1014 |
defn- |
928 |
defmacro |
910 |
and |
902 |
defmethod |
884 |
-> |
864 |
:require |
808 |
assoc |
678 |
ann |
672 |
count |
649 |
not |
608 |
testing |
569 |
cond |
569 |
or |
550 |
deref |
531 |
do |
509 |
println |
498 |
== |
481 |
first |
478 |
recur |
422 |
when-not |
401 |
doseq |
399 |
emitln |
346 |
:use |
344 |
is-clj |
337 |
assert |
336 |
is-tc-e |
330 |
run* |
320 |
* |
306 |
if-let |
303 |
update-in |
291 |
:import |
282 |
throw |
282 |
nil? |
268 |
reduce |
267 |
+ |
261 |
loop |
259 |
->> |
248 |
empty? |
240 |
into |
238 |
binding |
229 |
emits |
226 |
- |
221 |
fd/interval |
213 |
try |
211 |
for |
211 |
conj |
209 |
instance? |
208 |
catch |
204 |
swap! |
203 |
fresh |
200 |
next |
200 |
f |
196 |
All |
195 |
merge |
193 |
contains? |
191 |
inc |
189 |
core-run |
158 |
range |
183 |
meta |
182 |
nil |
175 |
when-let |
175 |
declare |
174 |
var |
172 |
set |
168 |
every? |
167 |
get |
166 |
matrix |
165 |
ret |
164 |
defrecord |
164 |
enqueue |
158 |
defalias |
157 |
rest |
156 |
< |
155 |
mapv |
148 |
sel |
146 |
name |
146 |
/ |
146 |
nom/tie |
146 |
test?<- |
144 |
HMap |
143 |
defprotocol |
143 |
is-tc-err |
142 |
Do you know any other good statistics to run on this dataset? Tell me by email (zirkonit at gmail.com), or, better yet, fork the repository on github (https://github.com/zirkonit/clamjamfry) and run the stats yoursefl!