This is not a comparison in order to decide which database is best or has the most valuable features. It is written from a programmer's point of view who often has to work with a certain database just because it can't be changed in the project. Of course between the lines it is also a cry for standards, but as long as the different vendors don't follow the standards it gives a guideline for workarounds in case of compatibility issues.
| ORACLE | MySQL | MS-SQL |
|---|---|---|
to_date('28.01.08','dd.mm.yy') |
str_to_date('28.01.08','%d.%m.%y') |
convert(date,'28.01.2008',104) |
| 'text ' || string || chr(39) | concat('text ',string, char(39)) | 'text ' + string + char(39) |
| instr('long ... text', 'needle') | position('needle' in 'long ... text') | patindex('%needle%', 'long...text') |
| GROUP BY ROLLUP(author_id) | GROUP BY author_id WITH ROLLUP | GROUP BY ROLLUP (author_id) |
| ELSIF | ELSEIF | n.a. (use nested IF ... ELSE) |
| NVL(expr,value) | IFNULL(expr,value) | ISNULL(expr,value) |
CREATE PROCEDURE sp_name( p1 IN NUMBER ) IS |
CREATE PROCEDURE sp_name( IN p1 INTEGER ) |
CREATE PROCEDURE sp_name @p1 SMALLINT = 1 AS |
var_a INTEGER; BEGIN | BEGIN DECLARE var_a INTEGER; |
SET NOCOUNT ON BEGIN TRY DECLARE @var_a INT |
DECLARE EXIT HANDLER FOR NOT FOUND SET var_a = 0; |
||
var_a := 55; |
SET var_a = 55; |
SET @var_a = 55; |
SELECT col_a INTO var_a
FROM a_table
WHERE rownum <= 1; |
SELECT col_a INTO var_a
FROM a_table
LIMIT 1; |
SELECT TOP(1) @var_a = col_a
FROM a_table;
(Note: *) |
EXCEPTION
WHEN NO_DATA_FOUND THEN
var_a := 0; |
END TRY;
BEGIN CATCH
SET @var_a = 0; |
|
END; | END | END CATCH; |
In contrary to the other database engines T-SQL doesn't produce an error if the select fetches multiple rows. It will walk through the result set and will assign the value of current row/record/cursor to the variable again and again. This can lead to logical errors in your procedure that are difficult to find. Essentially the variable will have the column's value of the last processed row. If you want to make sure that you always get the value of the first row add the TOP(1) after the SELECT as demonstrated above.
Starting with release 11g Oracle offers a aggregation function listagg that is the counter part to group_concat of MySQL.
SQL> SELECT g_ID, min(f_ID), max(f_ID),
LISTAGG(f_ID, ',') WITHIN GROUP (ORDER BY f_ID) AS list_of_IDs
FROM belong_to GROUP BY g_ID;
g_ID min(f_ID) max(f_ID) list_of_IDs
----- --------- --------- ------------------------------
cb 69 78 69,70,71,72,73,74,75,76,77,78
ndf 79 87 79,80,81,82,83,84,85,86,87
nv 60 68 60,61,62,63,64,65,66,67,68
suk 1 9 1,2,3,4,5,6,7,8,9
wes 29 37 29,30,31,32,33,34,35,36,37
The MySQL syntax looks more consistent because all specifications for the concatenation process are inside the parentheses for the parameter of the GROUP_CONCAT function. The additional SEPARATOR keyword seems to be more natural than Oracle's WITHIN GROUP keyword.
mysql> SELECT g_ID, min(f_ID), max(f_ID),
GROUP_CONCAT(f_ID ORDER BY f_ID SEPARATOR ',') AS list_of_IDs
FROM belong_to GROUP BY g_ID;
+------+-----------+-----------+-------------------------------+
| g_ID | min(f_ID) | max(f_ID) | list_of_IDs |
+------+-----------+-----------+-------------------------------+
| cb | 69 | 78 | 69,70,71,72,73,74,75,76,77,78 |
| ndf | 79 | 87 | 79,80,81,82,83,84,85,86,87 |
| nv | 60 | 68 | 60,61,62,63,64,65,66,67,68 |
| suk | 1 | 9 | 1,2,3,4,5,6,7,8,9 |
| wes | 29 | 37 | 29,30,31,32,33,34,35,36,37 |
+------+-----------+-----------+-------------------------------+
PostgreSQL achieves the same in a two step process. There is no aggregation function that returns a concatenated string, but PostgreSQL can return an aggregated array. And as it supports join and split functionality to move from arrays to strings and back you only need to chain the array_agg and the array_to_string functions:
SELECT g_ID, min(f_ID), max(f_ID),
array_to_string( array_agg(f_ID), ',') AS list_of_IDs
FROM belong_to GROUP BY g_ID;
With PostgreSQL 9 it it possible to use a short form that is similar to what MySQL and Oracle have. Instead of chaining array_agg and array_to_string you can simply use: string_agg(f_ID,','). It is also possible to specify a order by clause to the aggregate expression of array_agg or string_agg:
array_agg( f_ID ORDER BY f_ID DESC ),
string_agg(f_ID, ',' ORDER BY f_ID )
Oracle has a specific syntax for hierarchical queries
CONNECT BY
with the pseudo column level. In MS-SQL you need to implement this
with a Common Table Expression that references itself. The concept
is analogous to the linked list structures in C where you already use the name
of the structure while you are still defining the structure. Like in in C you
need to have a kind of declaration in advance and can't use the simple
WITH profile_tree AS (...) but need to list the fields of the
CTE explicitly.
WITH profile_tree (level, id, owner_id, path_name) AS
(
SELECT 1 AS level, p.id, p.owner_id,
cast(N'/' COLLATE Latin1_General_CS_AS + p.last_name AS NVARCHAR(60)) AS path_name
FROM profile p
WHERE p.id = 1 /* start at the root node (usually a parameter) */
UNION ALL
SELECT t.level + 1 AS level, p.id, p.owner_id,
cast(t.path_name + N'/' + p.last_name AS NVARCHAR(60)) AS path_name
FROM profile p INNER JOIN profile_tree t ON t.owner_id = p.id
) /* This is the end of the Common Table Expression */
SELECT level, id, owner_id, path_name FROM profile_tree
PostgreSQL doesn't require a formal table declaration in advance, but instead it needs the special recursive keyword to prepare the parser for the case that it will find a reference to the CTE inside the definition of the CTE. And it is not so picky about the type of the combined pathname column. It can just be VARCHAR, but doesn't need a type cast to fix the length of that field.
WITH RECURSIVE profile_tree AS
(
SELECT 1 AS level, p.id, p.owner_id,
'/' || p.last_name AS path_name
FROM profile p
WHERE p.id = 1 /* start at the root node (usually a parameter) */
UNION ALL
SELECT t.level + 1 AS level, p.id, p.owner_id,
t.path_name || '/' || p.last_name AS path_name
FROM profile p INNER JOIN profile_tree t ON t.owner_id = p.id
)
SELECT level, id, owner_id, path_name FROM profile_tree
The classical WHERE id IN (select max(id) from ...) to figure out the latest (or highest) record in a table might be inefficient because it runs two queries, the select for the maximum value in the subselect and the outer select. A SELECT ... FROM ... WHERE ... ORDER BY id DESC LIMIT 1 needs only a single pass through the table and should be substantially faster.
But what if you not only want the absolute maximum, but also the relative maximum for each month or each genre? Instead of comparing a scalar value to a one-dimensional list you have to compare a tupel with 2 elements to a 2 x m result set of the subquery.
SELECT b.g_ID, b.f_ID, f.year, f.director FROM belong_to b INNER JOIN films f ON b.f_ID = f.f_ID WHERE (b.g_ID,b.f_ID) IN (SELECT g_ID,max(f_ID) FROM belong_to GROUP BY g_ID); +------+------+------+-------------------+ | g_ID | f_ID | year | director | +------+------+------+-------------------+ | suk | 9 | 1947 | John Huston | | fn | 28 | 1939 | Raoul Walsh | | wes | 37 | 1993 | Detlev Buck | | sc | 59 | 1985 | Martin Scorsese | | nv | 68 | 1965 | Jean-Luc Godard | | ndf | 87 | 1982 | Michael Verhoeven | +------+------+------+-------------------+
This concept works in MySQL, PostgreSQL and in Oracle, but the MS-SQL server does not support such expressions.
The reason why such a tupel comparison is not allowed in MS-SQL might be that the above query could easily be rewritten into a join query:
SELECT b.g_ID, b.f_ID, f.year, f.director
FROM belong_to b INNER JOIN films f ON b.f_ID = f.f_ID
INNER JOIN (SELECT g_ID,max(f_ID) AS maxid FROM belong_to GROUP BY g_ID) c
ON b.g_ID = c.g_ID AND b.f_ID = c.maxid
GO
g_ID f_ID year director
----- ------- ------ -------------------------
suk 9 1947 John Huston
fn 28 1939 Raoul Walsh
wes 37 1993 Detlev Buck
sc 59 1985 Martin Scorsese
nv 68 1965 Jean-Luc Godard
ndf 87 1982 Michael Verhoeven
The WHERE ... IN is usually used when we want to avoid that
rows in one table appear multiple times because the field or fields in the
join condition appear multiple times in the other table. If you have
IN (1,2,1) you will have only 2 records, but with a join you
will get 3 records. But due to the GROUP BY in this case it is
guaranteed that each combination of tupels appears only once.
But this is the straight forward constellation if the primary key column is used to build the aggregate values. This will be different if we don't want the highest Id, but the latest year.
SELECT x.g_ID, x.f_ID, f.year, f.director, f.title
FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID
INNER JOIN
(SELECT x.g_ID, max(x.f_ID) as max_id, max(f.year) AS max_year
FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID
GROUP BY x.g_ID) m ON x.g_ID = m.g_id AND f.year = m.max_year
GO
g_ID f_ID year director title
----- ------- ------ -------------------- -----------------
suk 7 1949 Jean Cocteau Orphée
fn 19 1992 Steven Soderbergh Kafka
wes 29 1995 Jim Jarmusch Dead Man
sc 51 1995 Woody Allen Mighty Aphrodite
nv 66 1994 Gianni Amelio Lamerica
cb 78 1995 Danny Boyle Trainspotting
ndf 83 1995 Dieter Wedel Der Schattenmann
out 90 1995 Wayne Wang Blue in the Face
out 102 1995 Wayne Wang Smoke
des 111 1993 Robert Altman Short Cuts
If the data is post-processed it is very likely that the follow-up process is not able to handle double entry for genre out. windowing expressions allow us to use two columns for sorting:
SELECT g_ID, f_ID, year, director, title
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY x.g_ID ORDER BY f.year DESC, f.f_ID DESC) row_num,
x.g_ID, f.*
FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID
) src WHERE row_num = 1
ORDER BY 2
GO
g_ID f_ID year director title
----- ------- ------ --------------------- -----------------------
suk 7 1949 Jean Cocteau Orphée
fn 19 1992 Steven Soderbergh Kafka
wes 29 1995 Jim Jarmusch Dead Man
sc 51 1995 Woody Allen Mighty Aphrodite
nv 66 1994 Gianni Amelio Lamerica
cb 78 1995 Danny Boyle Trainspotting
ndf 83 1995 Dieter Wedel Der Schattenmann
out 102 1995 Wayne Wang Smoke
des 111 1993 Robert Altman Short Cuts
Another approach uses CROSS APPLY, but this time the genre out is completely missing from the list:
SELECT g.g_ID, src.f_ID, src.year, src.director, src.title,
FROM genres g CROSS APPLY
(SELECT TOP(1) b.g_ID, b.f_ID, f.year, f.director, f.title
FROM belong_to b INNER JOIN films m ON b.f_ID = f.f_ID
WHERE b.g_ID = g.g_ID ORDER BY f.year DESC, b.f_ID DESC) src
GO
g_ID f_ID year director title
----- ------- ------ --------------------- ------------------------
suk 7 1949 Jean Cocteau Orphée
fn 19 1992 Steven Soderbergh Kafka
wes 29 1995 Jim Jarmusch Dead Man
sc 51 1995 Woody Allen Mighty Aphrodite
nv 66 1994 Gianni Amelio Lamerica
cb 78 1995 Danny Boyle Trainspotting
ndf 83 1995 Dieter Wedel Der Schattenmann
des 111 1993 Robert Altman Short Cuts
Oracle implements dynamic SQL with the help of the PL/SQL procedural language. And it supports two types of dynamic SQL:
BEGIN
v_stmt := 'SELECT id, title, director FROM dvds WHERE year BETWEEN :x AND :y';
h_cursor := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE(h_cursor, v_stmt, DBMS_SQL.NATIVE);
-- Input variables
DBMS_SQL.BIND_VARIABLE(h_cursor, ':x', v_oldest);
DBMS_SQL.BIND_VARIABLE(h_cursor, ':y', v_youngest);
-- output variables
DBMS_SQL.DEFINE_COLUMN(h_cursor, 1, v_idr);
DBMS_SQL.DEFINE_COLUMN(h_cursor, 2, v_title, 40);
DBMS_SQL.DEFINE_COLUMN(h_cursor, 3, v_director, 30);
v_ignore := DBMS_SQL.EXECUTE(h_cursor);
LOOP
IF DBMS_SQL.FETCH_ROWS(h_cursor) > 0
THEN
-- get column values of the row
DBMS_SQL.COLUMN_VALUE(h_cursor, 1, v_id);
DBMS_SQL.COLUMN_VALUE(h_cursor, 2, v_title);
DBMS_SQL.COLUMN_VALUE(h_cursor, 3, v_director);
ELSE
EXIT; -- No more rows to copy:
END IF;
END LOOP;
DBMS_SQL.CLOSE_CURSOR(h_cursor);
END;
Using the OPEN FOR statement and a cursor variable can do the same and is a lot easier (The cursor variable is the type of variable for storing result set like data that we were missing in the above approach):
DECLARE
-- Cursor is only needed to easily create a record for the dynamic cursor
CURSOR cur_spec IS SELECT id,title,director FROM dvds;
rec_movie cur_spec%ROWTYPE;
v_cursor SYS_REFCURSOR;
BEGIN
v_from := 1956;
v_until := 2000;
v_stmt := 'SELECT id,title,director FROM dvds WHERE year BETWEEN :x AND :y';
OPEN v_cursor FOR v_stmt USING v_from, v_until;
LOOP
FETCH v_cursor INTO rec_movie;
EXIT WHEN v_cursor%NOTFOUND;
dbms_output.put_line( 'Movie: ' || rec_movie.title);
END LOOP;
CLOSE v_cursor;
END;
MySQL does this directly in the mysql terminal with the help of session variables:
mysql> set @qry = 'SELECT id, title, director, year, length FROM dvds WHERE year BETWEEN ? AND ?'; Query OK, 0 rows affected (0.02 sec) mysql> PREPARE dynsql FROM @qry; Query OK, 0 rows affected (0.01 sec) Statement prepared mysql> set @oldest = 1967; mysql> set @youngest = 1976; mysql> EXECUTE dynsql USING @oldest,@youngest; +----+----------------------------+----------------------+------+--------+ | id | title | director | year | length | +----+----------------------------+----------------------+------+--------+ | 24 | Spiel mir das Lied vom Tod | Sergio Leone | 1968 | 159 | | 27 | Badlands | Terence Malick | 1973 | 90 | | 58 | The Wild Bunch | Sam Peckinpah | 1969 | 139 | | 71 | Hundstage | Sidney Lumet | 1975 | 119 | +----+----------------------------+----------------------+------+--------+ 4 rows in set (0.00 sec) mysql> set @youngest = 1971; mysql> EXECUTE dynsql USING @oldest,@youngest; mysql> DEALLOCATE PREPARE dynsql;
(see also Information for MySQL Migrations: Triggers and Stored Procedures)
Oracle's PL/SQL has a fairly complete set of methods to pass structured data to a procedure: You can either use a cursor variable or a collection of records (known as PL/SQL index-by table) for result set like data. Or you can just pass a record variable for "one row" values. This is handy if you have to specify a lot of parameters for the procedure. You don't need to declare every single parameter (which turns out to be a maintenance nightmare if you have more then 5 or 6 parameters) but you can hide the single parameters as attributes (or members) inside the record variable. You won't need to change the signature of your procedure or function and neither the parameter list at the places where you call the procedure. You only need to add another member to the record definition and recompile your code.
MS-SQL tends to be a nightmare when you need to create a procedure that
depends on several attributes as parameters. In many cases these are some of
the values of a table row. Passing the table row as a record is not possible.
Instead you have to declare a variable for each column value of the table row,
that you want to pass as parameter.
Table variables make your life easier in such a case. But because
MS-SQL doesn't have the concept of packages you can't declare the
record, cursor or table variable close to the procedure in the same package,
but you have to do it on database level:
CREATE TYPE dbo.param_tab AS TABLE
(
some_id BIGINT NOT NULL,
name NVARCHAR(255),
a_date DATETIME
)
CREATE PROCEDURE process_the_data
(
@params dbo.param_tab READONLY
)
AS
BEGIN
SELECT TOP(1) ...
FROM @params p INNER JOIN other_table o ON p.some_id = o.id
WHERE ....
END
The attribute READONLY is required and it shows that in contrary to Oracle's cursor or record variables the table variable can not be changed.
A similar inconvenience appears in MS SQL when you fetch from a cursor.
You can't fetch into a record variable, but have to declare a variable with
the appropriate type for each field that is selected in the cursor.
Maybe this inconvenience is on purpose to remind the people that cursor and
loops are procedural thinking and against the spirit of SQL.
DECLARE @x INT = 1956, @y INT = 2000
DECLARE cur_movies CURSOR FOR
SELECT id, title, director, year FROM dvds WHERE year BETWEEN @x AND @y
BEGIN
DECLARE @aValue INT
DECLARE @id INT, @year INT,
@title NVARCHAR(255), @director NVARCHAR(250)
OPEN cur_movies
FETCH NEXT FROM cur_movies INTO @id, @title, @director, @year
WHILE @@FETCH_STATUS = 0
BEGIN
-- do something with the variables that were filled by the cursor row
SET @aValue = @year + @id
FETCH NEXT FROM cur_movies INTO @id, @title, @director, @year
END
CLOSE cur_movies
DEALLOCATE cur_movies
END
The @@FETCH_STATUS cursor attribute variable is set by the
last FETCH NEXT. If FETCH NEXT would return a
boolean variable it could be used directly as condition for the WHILE.
But because it is only writing into and modifying the global pseudo variable
fetching and testing the result of the fetch is splitted into two statements
and hence we need the FETCH statement twice: one time before the while and
another time at the end of the WHILE's execution block.
And sadly each time we have to give the full list of variables after the INTO.
Better don't ask me how often I have forgotten to add the new variable also
to the second FETCH NEXT statement ...
TABLE(...). PostgreSQL knows about functions that return
a set of records (where the type for these records have to be
predefined as a custom type) or a TABLE(..colspec..) where the
returned composite type is specified in the function signature and can
return n x m structured data.
Both database engines treat such kind of functions as if they were views and
consequently you can select from them like
SELECT * FROM my_func(23)
PostgreSQL has a very similar concept but consequently because the body of
the function doesn't contain any procedural stuff, the language
plpgsql is not needed and SQL as language is good
enough. The following function generates some basic DDL for the available
tables with a query against the information_schema. The sizes, that
are passed as parameter for formatting of the generated SQL have to go deeply
inside the subselect, it is nearly impossible to do this with a simple view.
CREATE OR REPLACE
FUNCTION re_create_tables(p_name_size integer, p_colspec_size integer)
RETURNS TABLE(table_spec text)
AS $$
SELECT concat(a.tab_name, a.column_name, a.column_spec) AS table_spec FROM
(SELECT CASE ordinal_position WHEN 1 THEN concat('CREATE TABLE ', table_name, ' (', chr(10)) ELSE '' END as tab_name,
table_name, ordinal_position, rpad(column_name,p_name_size) as column_name,
rpad(concat(CASE upper(data_type) WHEN 'CHARACTER VARYING' THEN 'VARCHAR' ELSE upper(data_type) END,
CASE WHEN upper(data_type) IN ('CHARACTER VARYING') THEN format('(%s)',character_maximum_length)
WHEN upper(data_type) IN ('BIGINT','INTEGER','SMALLINT') THEN ''
WHEN upper(data_type) IN ('NUMERIC','DECIMAL') THEN format('(%s,%s)',numeric_precision,numeric_scale)
ELSE '' END,
CASE is_nullable WHEN 'NO' THEN ' NOT NULL,' ELSE ',' END),p_colspec_size) AS column_spec
FROM information_schema.columns
WHERE table_schema = 'public') a
ORDER BY table_name, ordinal_position
$$ LANGUAGE SQL;
/* Another function that uses plpgsql as language (which is the wrong language here) */
CREATE OR REPLACE
FUNCTION get_film(p_pattern VARCHAR)
RETURNS TABLE (
film_title VARCHAR,
film_year INT
)
AS $$
BEGIN
RETURN QUERY
SELECT title, cast(release_year as integer)
FROM film WHERE title LIKE p_pattern;
END;
$$ LANGUAGE 'plpgsql';
Oracle and PostgreSQL allow you to pass around cursor variables (that can even be scrollable and are much cleaner as the MS-Sql and MySql approach with simply selecting in the output stream inside stored procedures.
| SQL 92 | Oracle | MySQL | PostgreSQL | MS-SQL |
|---|---|---|---|---|
| coalesce() | Yes since 8i, before nvl() | Yes, since 3.23.3 | Yes | Yes, since 2008, before isnull() |
| case ... when | 9i Yes, before decode() | Yes | Yes | Yes |
| extract(.. from ..) | only a few unit specifiers | Yes | Yes | No, use DATEPART(year, date_expr) |
| position(.. in ..) | No, use instr() | Yes, as a subset of locate() | Yes | No, use patindex() or charindex() |
| auto increment | no (needs sequence/trigger) | INT AUTO_INCREMENT | (BIG)SERIAL | IDENTITY |
| table(nested table) | unnest(array-type) | |||
| INTERSECT | Yes | No, use INNER join (a) | Yes | Yes |
| EXCEPT | No, use MINUS | No, use OUTER join (b) | Yes | Yes |
-- (a)
SELECT DISTINCT a.year AS common_years
FROM films a INNER JOIN films b ON a.year = b.year
INNER JOIN belong_to ab ON a.f_ID = ab.f_ID
INNER JOIN belong_to bb ON b.f_ID = bb.f_ID
WHERE ab.g_ID = 'wes' AND bb.g_ID = 'fn'
+--------------+
| common_years |
+--------------+
| 1992 |
+--------------+
-- (b)
SELECT DISTINCT a.year AS exclusive_years
FROM films a INNER JOIN belong_to ab ON a.f_ID = ab.f_ID
LEFT JOIN
(SELECT b.year FROM films b INNER JOIN belong_to bb ON b.f_ID = bb.f_ID
WHERE bb.g_ID = 'wes') n ON a.year = n.year
WHERE ab.g_ID = 'fn' AND n.year is null
+-----------------+
| exclusive_years |
+-----------------+
| 1995 |
| 1993 |
| 1948 |
| 1943 |
| 1954 |
| 1957 |
| 1950 |
+-----------------+
There are two approaches for this problem:
Oracle and PostgreSQL both support the RETURNING INTO clause to
get hold of the last used ID value. Actually this syntax can also serve for
other purposes and not only for auto increment values.
INSERT INTO persons (firstname,lastname)
VALUES ('Peter','Smith') RETURNING the_id INTO var_id;
MS-SQL uses a different syntax for the same concept:
INSERT INTO persons (firstname,lastname)
OUTPUT inserted.the_id
VALUES ('Peter','Smith')
The MS-SQL commands creates a result set. This means in the host language you have to treat this insert just like you would do with a select statement. For Oracle and PostgreSQL you have to prepare a output variable that you need to bind to a specific host variable (or of course in PL/SQL and PL/pgSQL a variable declaration in the surrounding block is enough).
For all database engines the listed function have a scope of the active session. A parallel insert from a second session won't modify the returned value. So they are safe, except if the insert fires one or more triggers, that are also using auto increment values. Of course the additional inserts made by the triggers are in the same session (they are even in the same transaction) and will affect the returned value of these functions.
LAST_INSERT_ID() Function for handling this in the
database and an analogous function mysql_insert_id() if you
need to retrieve this value from a host language like C, Java or Perl.SCOPE_IDENTITY() function ignores the modification by
triggers in contrary to the older method with a server variable
@@IDENTITY. And surprise, MS-SQL returns the ID of the
last inserted row in case of a multiple row insert.LASTVAL()The MS-SQL method BULK INSERT is somewhere located between Oracle's external tables and the MySQL approach load data [local] infile. Both BULK INSERT and external tables need the datafiles to be located on the server's file system. MySQL is not limited to this if you use the optional keyword local. But the syntax and concept of BULK INSERT are very similar to load data infile as both need an existing "real" table in which the data will be inserted.
But even if you can do it with MySQL ... using a file on the clients file system might be against the spirit of these bulk loading tools. Because it obviously means that the data has to travel somehow over the network in order to get into the database and this includes quite a lot of network protocol overhead.
BULK INSERT an_existing_table FROM 'C:\path\on\the\server\file.csv' WITH ( DATAFILETYPE = 'char', -- single byte encoding CODEPAGE = 'ACP', -- Ansi Code Page (windows-1252) FIELDTERMINATOR = ';' ROWTERMINATOR = '\n' )
LOAD DATA LOCAL INFILE '/home/rolf/data/voicemail-calls.csv' INTO TABLE voicemail FIELDS TERMINATED BY ';' (id,calling_no,called_no,calling_time,week_day,isdn_cause,proto_err);
BULK INSERT has a cousin OPENROWSET .. BULK that
works more like the external tables
of Oracle. It reads the flat datafile every time when it is accessed and keeps
the data in memory.
SELECT m.empid, m.firstname, m.lastname
FROM OPENROWSET( BULK 'C:\path\on\the\server\file.csv',
FORMATFILE = 'C:\path\on\the\server\employess.fmt') AS m;
Both Oracle and MS-SQL have a TRUNCATE TABLE statement. But
there is a crucial difference between them. In Oracle it is a DDL statement and
requires the DROP TABLE privilege and as all DDL statements it
executes an implicit COMMIT and breaks any transaction border of
the surrounding statements.
The MS-SQL server treats a TRUNCATE TABLE statement as a
super DELETE that can't have a filter criterion and
collects no row-based transaction information. But if the TRUNCATE
is inside a transaction there is still the chance to revert the effects of the
TRUNCATE later on with a ROLLBACK.
Although many users won't like it, because they have learned it the hard
way and lost their data when using TRUNCATE in an Oracle database.
Oracle's approach is more consistent, because the keyword TABLE
should alert you that you are not performing normal data manipulation, but
are executing some data definition command.
update test_data set label=description, description = label;
The above statement should simply exchange the data in label with the values in description. This works for PostgreSQL, MS-SQL and ORACLE, but it doesn't work for MySQL. When MySQL is evaluating the description = label it doesn't use the original input value of label, but the one after processing the assignment of description to label. So finally both columns label and description will have the same value from the original description column.
Luckily most of the time you need either min, max or sum as aggregate functions which all behave as expected. But if you use avg in MS SQL you will look at the data and think that the values look strange and wrong.
SELECT avg(n) AS average FROM (VALUES (0),(1),(1),(1)) AS num(n)
average
-----------
0
On second sight you will realize that the SQL-Server works with integer values and that he even applies integer arithmetic when dividing the sum of all through the number of rows. So the correct way to build the average, would require a type cast to float or decimal first:
SELECT avg(cast(n as float)) AS average
FROM (VALUES (0),(1),(1),(1)) AS num(n)
average
------------------------
0.75