SELECT id, random()
FROM generate_series(0, 1000000) g (id);
There seems to be an equivalent in SQLite: https://sqlite.org/series.html[1] https://www.postgresql.org/docs/current/functions-srf.html
([]x?x:1000000)
Which gives a table of a million rows like: x
------
877095
265141
935540
49015
... ?~1000000
or, if you don't need elements to be unique: ?$~1000000
Though it's just an array, not a persistent table - I don't know much about Jd :( xxd -ps -c 16 -l $bignum < /dev/urandom | (echo '\copy random_data from stdin'; cat) | psql
Depending on how you want your random keys formatted. shuf -i 1-$bignum
or for random numbers with replacement shuf -i 1-$bignum -r -n $bignum
These give you a sample of random integers. `shuf` is part of GNU coreutils, so present in most Linux installs (but not on macOS). WITH RECURSIVE
t AS (
SELECT 0 id
UNION ALL
SELECT id + 1
FROM t
WHERE id < 1000000
)
SELECT id, random() FROM t;
It returns 1000001 rows, but so does cribwi's. SELECT n FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS d(n)
for instance: SELECT seq = units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS units(n)
CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS tens(n)
CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS hundreds(n)
CROSS JOIN (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS thousands(n)
WHERE units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000 BETWEEN 1 AND 1337
You can make it neater still by making the sub-table with a CTE: WITH Digits AS (SELECT n FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS d(n))
SELECT seq = units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000
FROM Digits AS units
CROSS JOIN Digits AS tens
CROSS JOIN Digits AS hundreds
CROSS JOIN Digits AS thousands
WHERE units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000 BETWEEN 1 AND 1337
or WITH Digits AS (SELECT n FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS d(n))
, Thousands AS (SELECT seq = units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000
FROM Digits AS units
CROSS JOIN Digits AS tens
CROSS JOIN Digits AS hundreds
CROSS JOIN Digits AS thousands
)
SELECT seq FROM thousands ORDER BY seq
WHERE units.n + tens.n*10 + hundreds.n*100 + thousands.n*1000 BETWEEN 1 AND 1337
though at that point you probably have the recursive option so you'd need to test to see which is more efficient. As a further hack you can combine this and the recursive CTE to extend the number of rows you can return when recursion limits would otherwise be a problem.You can also use simple window functions to make a sequence from a system table that you know will have a few thousand rows:
SELECT n=ROW_NUMBER() OVER (ORDER BY object_id) FROM sys.all_columns
Or if you just need a few thousand rows to hang randomly generated data from you don't even need the function.All a bit hacky compared to some DBs that have sequence support, but useful for generating test data or numbers tables where that isn't supported.
If you need to create test data with complex business logic, referential integrity and constraints we've been working on declarative data generator that is build exactly for this: https://github.com/openquery-io/synth.
Needs a little Groovy but very convenient for generating random (or non-random) data.
While this looks like a good way to generate simple data, practical applications are more involved.
E.g. I have a table structure but not alot of rows in it, so i go use this to get alot of rows in to check how fast queries get processed?
You could create a new database and restore a production backup instead of using fake data but that might not be allowed or require some kind of approval due to rules protecting the privacy of customers or employees.
The other is for populating "numbers tables" which can be very useful in reporting contexts and/or dealing with certain gaps/islands type problems.
You can even use the techniques dynamically rather than a stored numbers table, if you need it in a DB that doesn't have one and you don't have schema permissions to create one, though this is generally less efficient.
select top 1000000 ROW_NUMBER() from sys.objects a, sys.objects b
or you can use INFORMATION_SCHEMA which is more portable across different RDBMS engines
Much more performant and naturally relational way of generating data than looping recursively.
Gotta select row number on some big enough table!
A table of which DMBS’s support this would be useful
[1]: http://dcx.sap.com/index.html#1001/en/dbrfen10/rf-select-sta...
[2]: http://dcx.sap.com/index.html#sqla170/en/html/8190aea36ce210...
Microsoft SQL Server, Firebird 2.1, PostgreSQL, SQLite, IBM Informix version, CUBRID, MariaDB and MySQL
spark.sparkContext.range(1_000_000, numSlices=200).take(20) select dbms_random.value from dual connect by level < 1000001;
edit: 1,000,001 as level starts with 1For reference, in J such sequence of integers can be generated with:
1+i.1000000
1 2 3 4 5 6 7 8 9 10 ....
or: 1+i.1000000 1
1
2
3
4
5
6
7
8
9
10
....
To explain the previous examples (and let's use smaller integer for less typing...), the `$` verb is called "shape"/"reshape", and it takes a (list of) values on the right side, and a list of dimensions on the left: 5 2 $ 10
10 10
10 10
10 10
10 10
10 10
if there's not enough values on the right, they are cycled: 5 2 $ 10 11 12
10 11
12 10
11 12
10 11
12 10
which degenerates to repetition if there's only one value on the right. The `~` adjective (called "reflex") modifies a verb to its left in the following way: V~ x NB. same as x V x
so `$~10` is the same as `10$10`, which is a list of ten tens. That list is passed to `?`, which is a verb called "roll", which gives a random integer in the 0-(y-1) range when written as `? y`. `y` here can be a scalar, or a list, in which case the roll is performed for each element of the list: $~ 10
10 10 10 10 10 10 10 10 10 10
? $~ 10
1 6 9 4 6 8 8 7 9 4
The dyadic case, ie. `x ? y` is called "deal", which selects `x` elements from `i. y` list at random, without repetitions. `?~ y`, then, effectively shuffles the `i. y` list: ?~10
6 9 7 3 5 1 8 0 4 2
"deal" can be used to shuffle any list, not only the `i. y` sequence, by using the shuffled list as indexes of another list (using `{` verb, called "from"): 2*i.10
0 2 4 6 8 10 12 14 16 18
(?~10){2*i.10
14 10 18 6 2 12 8 0 16 4
...I know, I know, it is strange. But it's so interestingly mind-bending that I'd be really happy if I had a valid excuse to pour hundreds of hours into learning J properly. Sadly, I don't have anything like that, so I only spread the strangeness from time to time in comments, like I do right now :)All the primitives (words) are described here: https://code.jsoftware.com/wiki/NuVoc