Master Table Design

SQL

Introduction

Master table layout and creation. The master table controls all data tables via one to one relationship and carries audit info.

 

Layout

 

Column Type Description
id int (Auto Increment) Primary key for master table. All data tables must include an ID field (NOT auto increment) linked to this field via one to one relationship.
id_group int Version linking ID. Multiple entries share an identical group ID to identify them as a single record with multiple versions. If no previous versions of a new record exist, then this column is seeded from ID field after initial creation.
active bit If TRUE, marks this entry as the active version of a record. Much faster than a date lookup and necessary for soft delete.
create_by int Account creating this version. -1 = Unknown.
create_host varchar(50) Host (usually IP provided by control code) creating entry.
create_time datetype2 Time this entry was created.
create_etime Computed column Elapsed time in seconds since entry was created.
update_by Same as create_x, but updated on every CRUD operation.
update_host
update_time
update_etime

 

Set Up

CREATE TABLE [dbo].[_a_tbl_master](
	id				int IDENTITY(1,1)	NOT NULL,						-- Primary unique key.
	id_group		int					NULL,							-- Primary record key. All versions of a given record will share a single group ID.
	active			bit					NOT NULL,						-- Is this the active version of a record? TRUE = Yes.
		-- Audit info for creating version. A new 
		-- version is created on any CRUD operation 
		-- in the data tables controlled by master.
	create_by		int					NOT NULL,						-- Account creating this version. -1 = Unknown.
	create_host		varchar(50)			NOT NULL,						-- Host (usually IP from control code) creating version. 
	create_time		datetime2			NOT NULL,						-- Time this version was created. 
	create_etime	AS (datediff(second, [create_time], getdate())),	-- Elapsed time in seconds since creation.	
		-- Audit information for updating version.
		-- When any CRUD is performed on a data
		-- table, the previously active version
		-- is marked inactive. Deleting a record
		-- simply marks all versions inactive.
		-- In short, the only updates made to 
		-- a master table are toggling Active
		-- flag.
	update_by		int					NOT NULL,						-- Account updating this version. -1 = Unknown.
	update_host		varchar(50)			NOT NULL,
	update_time		datetime2			NOT NULL,
	update_etime	AS (datediff(second, update_time, getdate())),
 
CONSTRAINT PK__a_tbl_master PRIMARY KEY CLUSTERED 
(
	id ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE _a_tbl_master ADD CONSTRAINT DF__a_tbl_master_active		DEFAULT ((1))			FOR active
GO

ALTER TABLE _a_tbl_master ADD CONSTRAINT DF__a_tbl_master_create_by		DEFAULT ((-1))			FOR create_by
GO

ALTER TABLE _a_tbl_master ADD CONSTRAINT DF__a_tbl_master_create_host	DEFAULT (host_name())	FOR create_host
GO

ALTER TABLE _a_tbl_master ADD CONSTRAINT DF__a_tbl_master_created		DEFAULT (getdate())		FOR create_time
GO

ALTER TABLE _a_tbl_master ADD CONSTRAINT DF__a_tbl_master_update_host	DEFAULT (host_name())	FOR update_host
GO

 

Encapsulated XML ID List Parsing – In Progress

SQL

Introduction

As part of my ongoing MSSQL Versioning Project, it is often times necessary to pass a list of records to stored procedures, either from the controlling application or from a calling procedure. By use of standardized table design, you will normally only need to pass a list of primary keys. The procedure can query for any other information it needs from that point forward, making it fully self contained more maximum encapsulation.

For the list itself there are several options I know of, but only one is fully viable for my own needs. Your mileage may vary of course:

  1. Delimited List: On the surface this is the simplest of means. Just slap together a comma delimited string (“x, y, z”), break it down at the database console and off you go. If only it were actually that simple, and even if it was there’s not a chance on Earth I’m doing that. Neither should you. This is breaking the law of First Normal Form, something you never want to do for reasons well beyond the scope of this article. If you are curious, several (but by no means all) of the pitfalls of this approach are explained quite nicely here.
  2. Table Valued Parameters: TVPs are extremely powerful when used correctly and have their place, but for purposes of acting as lists or caches in a stored procedure, they have two serious drawbacks.
    1. TVPs are assumed by the query optimizer to contain a single row. This is of little to no consequence if only a few records are involved, but it can be disastrous once a threshold of ~100 is reached. Queries with execution times normally in the millisecond range may suddenly balloon into resource hogs requiring several minutes to complete.
    2. It’s rather unusual, but our environment is WIMP (Windows, IIS, MSSQL, PHP). Connectivity is provided by the sqlsrv driver, with an object oriented wrapper penned by yours truly. Unfortunately at time of writing, this otherwise excellent driver set does not support TVP.
  3. Common Language Runtime: Lots of fun to be had here, but like TVPs, it depends on a very specific environment. Otherwise it simply isn’t applicable. Even when it is, the realities of business often mean taking advantage of CLR adds layers of extra management and time wasted on service requests for the simplest of modifications. No thank you.
  4. XML: This is my method of choice. It’s reasonably fast, only giving up speed to CLR and TVP, and eventually surpassing the later as the number of records increases. It’s also T-SQL compliant, and thus quite portable. The downside is there’s more of a learning curve and you’ll want to design carefully to avoid huge strings. Let’s have a closer look at how…

Considerations

  • Efficiency: We want the XML itself and parser to be compact and fast as possible.
  • Scaling: The design should be solid and not break down under a heavy load.
  • Reuse. We need to encapsulate and standardize our code. It won’t do much good if every query or procedure requires an inline rewrite of the XML parsing.

Implementation

There are three basic scenarios where we will need to parse a list of IDs via XML.

Procedure A executes Procedure B, Sending List of IDs

This will be a common occurrence – in my MSSQL versioning design every procedure that updates data must first update the Master Table. Should sub data be involved, then multiple updates to Master table must first take place – one for each row of sub data updated. First Procedure A will establish the list of records to update as a temp table, as in the following example:

id created update_by update_ip active
1 2016-12-28 115 128.163.237.37 False
2 2016-12-28 115 128.163.237.37 False
3 2016-12-28 115 128.163.237.37 False

Once the table is ready, this query is run against it:

SET @xml_string_var = (SELECT id FROM <temp table of items to update> FOR XML RAW, ROOT)

The variable @xml_string_var will be populated with an XML string as follows. Note <root> and row. These are default outputs that we could change these by modifying our SQL above, but I prefer to leave them be. Since this little bit of SQL will be in nearly every data update procedure, let’s keep it simple and reusable as possible.

<root>
<row id=”1″ />
<row id=”2″ />
<row id=”3″ />
</root>

We can now execute Procedure B passing @xml_string_var as an XML string argument.

Procedure B Receives XML From Procedure A

Upon execution, Procedure B will need to break the XML back down into a table. Rather thanĀ Procedure B breaking the XML down inline, let’s outsource the work. We could do this with a stored procedure, but the moment we executed a procedure that in turn executed our XML parser, we would run smack into the irritating limitation of nested executions. For those unfamiliar, MSSQL 2008 and below simply do not allow nested stored procedure execution. Any attempted to do so will produce the following error:

Msg 8164, Level 16, State 1, Procedure <procedure_name>, Line <line> An INSERT EXEC statement cannot be nested.

In short, encapsulation as a stored procedure just won’t work. That really just leaves user defined functions. I personally loathe them for a lot of different reasons. They appeal to the programmer in me, but in SQL tend to cause more trouble than they’re worth. Still, if we want to encapsulate the XML parsing (and we DO), a table valued function is the best way to go. We’ll call it tvf_get_id_list:

-- tvf_get_id_list
-- Caskey, Damon V.
-- 2017-01-25
-- Returns recordset of IDs from xml list
-- 
-- <root>
--		<row id="INT" />
--		... 
-- </root>
			

CREATE FUNCTION tvf_get_id_list (@param_id_list xml)
RETURNS TABLE AS
RETURN (SELECT x.y.value('.','int') AS id	
			FROM @param_id_list.nodes('root/row/@id') AS x(y))

Procedure B will call tvf_get_id_list, passing along the XML. The tvf_get_id_list will break the XML down and produce a record-set of IDs, which we can then insert into temp table:

id
1
2
3

Procedure B will now have a access to record set of IDs that it can use to perform whatever work we need done.

As you can see, the XMl parsing work is fairly simple – we specifically planned the XML markup for easy break down. Even so encapsulating the XML work out to a separate function gives us a couple of advantages over just pasting the XML parsing code inline.

  • Obviously we will use the fastest and best scaled technique for breaking down the XML (see here for examples), but should even better techniques be developed, we only need to modify this one function.

Procedure B and any other procedures that we send our XML list to are simpler and more compact. They need only call tvf_get_id_list to break down the XML list into a usable record set.

Procedure Called By Control Code

This is more or less identical to procedures executing each other, except the procedures are being called by application code. In this case, it is the application’s responsibility to send XML formatted as above. The simplicity of XML makes this rather easy, and the parsing code can be made part of a class file.

foreach($this->id as $key => $id)
{	
					
	if($id == NULL) $id = DB_DEFAULTS::NEW_ID;
							
	$result .= '<row id="'.$id.'">';									
}

MSSQL Relational Record Versioning – In Progress

SQL

Versioning Notes (In Progress)

Master Table Update

The master table controls all data tables in the database, including sub tables containing one to many relational data (ex. One Person -> Many Phone Numbers). This means our master update procedure must be able to handle multiple record updates at once and be modular enough to execute by another update procedure – both for the updating the that procedure’s target data table AND any related sub tables. Otherwise the whole encapsulation concept falls apart.

  1. Populate a temp table with list of update IDs. These are the record IDs that we want modified (or created as the case may be). Ultimately we will be inserting these as new records with new IDs no matter what, but we’ll need to perform versioning if these IDs already exist in the master table.
    ID
    1
    2
    3
  2. Find any records in Master Table that match the Update List, and mark them as inactive. They will be replaced with new inserts.
    UPDATE
    			_a_tbl_master
    		SET
    			active = 0
    		FROM
    			#master_update_source _new
    		WHERE 
    			_a_tbl_master.id = _new.id;
  3. Prepare a list of inserts consisting of records where update list and master table IDs math, AND unmatched items in the Update List. The combined list is used to populate a temp table. This is also where we acquire the group ID for existing records. The group ID will be applied to new inserts (versions) of the existing records.
    INSERT INTO 
    			#master_update_inserts (id, id_group, update_by)
    		SELECT 
    			_current.id, _current.id_group, @update_by
    		FROM #master_update_source _source  
    			LEFT JOIN _a_tbl_master _current ON _source.id = _current.id
    
  4. Apply list of inserts to the Master Table. Use OUTPUT clause to populate a temp table with a list of IDs for each insert.
    INSERT INTO 
    			_a_tbl_master (id_group, update_by)
    		OUTPUT 
    			INSERTED.ID 
    				INTO #master_update_new_id
    		SELECT 
    			id_group, update_by 
    		FROM 
    			#master_update_inserts
  5. Using the list of IDs created when records were inserted to Master Table, we run an UPDATE against Master Table on the list of newly created IDs, where the id_group field is empty. This is to seed new records (not new versions of existing records) with a group ID.
  6. Master Table is now populated. New records will have a group ID identical to their ID, while existing records will have a new ID, but retain their previous group ID.
    ID id_group created update_by update_ip active
    2844 2884 2016-12-28 10:13:45.1900000 115 128.163.237.37 False
    2845 2845 2016-12-28 10:13:45.1900000 115 128.163.237.37 False
    2846 2846 2016-12-28 10:13:45.1900000 115 128.163.237.37 False
    2989 2844 2016-12-28 22:42:14.7930000 115 128.163.237.37 True
    2990 2845 2016-12-28 22:42:14.7930000 115 128.163.237.37 True
    2991 2846 2016-12-28 22:42:14.7930000 115 128.163.237.37 True

Full procedure (in progress)

-- Caskey, Damon V.
-- 2016-12-20
--
-- Update master table. Must be run before
-- any data table controlled by master is
-- updated. Outputs record set containing
-- IDs for the updated master that a
-- calling data update procedure will need.


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[_a_master_update]
	
	-- Parameters
	@arg_id					int				= NULL,		-- Primary key. 			
	@arg_update_by			int				= NULL,		-- ID from account table.
	@arg_update_ip			varchar(50)		= NULL		-- User IP, supplied from application.
			
AS
BEGIN

	-- Let's create the temp tables we'll need.
		
		-- List of update requests. All we
		-- need are IDs. The rest is handled
		-- by parameters or generated by
		-- default data binds in the master
		-- table.
		CREATE TABLE #master_update_source
		(
			id			int
		)

		-- Prepared list of items that
		-- will be inserted into the master
		-- table.
		CREATE TABLE #master_update_inserts
		(
			id			int,
			id_group	int
		)

		-- List of new item IDs created when
		-- inserts are perform on master
		-- table.
		CREATE TABLE #master_update_new_id
		(
			id			int
		)
		

	-- Populate update source (for experiment).
	INSERT INTO #master_update_source (id)
	VALUES (-1), (-1), (2844), (2845), (2846)


	-- Find any records that match our 
	-- update list and mark them as inactive.
		UPDATE
			_a_tbl_master
		SET
			active = 0
		FROM
			#master_update_source _new
		WHERE 
			_a_tbl_master.id = _new.id;

	-- Prepare inserts. Here we are adding inserts for new
	-- records AND for records that already exist. We do the
	-- later so we can get the current group ID and pass it on. 
		INSERT INTO 
			#master_update_inserts (id, id_group)
		SELECT 
			_current.id, _current.id_group
		FROM #master_update_source _source  
			LEFT JOIN _a_tbl_master _current ON _source.id = _current.id

	-- Apply the insert list (insert into master table). New
	-- IDs created by the database are output into
	-- a temp table.
		INSERT INTO 
			_a_tbl_master (id_group, update_by, update_ip)
		OUTPUT 
			INSERTED.ID 
				INTO #master_update_new_id
		SELECT 
			id_group, @arg_update_by, @arg_update_ip 
		FROM 
			#master_update_inserts

	-- For new records, seed the group ID with
	-- new record's ID.
		UPDATE
			_a_tbl_master
		SET
			id_group = _new.id
		FROM
			#master_update_new_id _new
		WHERE 
			_a_tbl_master.id = _new.id AND _a_tbl_master.id_group IS NULL;

END