how to design the database for the follwing scenario?
I'm working on a regression test project. Let's say the model is like: R =
f(C). C is the collection of test cases with the size is about 2000, and R
is the result of all test cases. It will produce about 2000000 records to
insert into the db on every daily run. We'll keep a baseline for the R,
and baseline is initialized as the R of the first daily run. After each
daily run, we'll compare the result with the baseline(we call the
difference between the daily run result with the baseline as Regression
Issue). If there are differences, we'll investigate it and update some
records of the baseline or file a bug if necessary. baselinetable:
(casename, resultid) is primary key. buildnumber casename resultid result
1 case1 0 abc 1 case1 1 def 1 case2 0 ijk resulttable's schema is the same
as baselinetable except: (buildnumber, casename, resultid) is the primary
key. buildnumber casename resultid result 1 case1 0 abc 1 case1 1 def 1
case2 0 ijk 2 case1 0 abc 2 case1 1 fff 2 case2 0 ijk The problem is that:
the db grows too fast (2000000 records per day). So I tried the following
2 solution: solution A: Only keep those records which are different with
current baseline. resulttable: (buildnumber, casename, resultid) is the
primary key. buildnumber casename resultid result baseline_buildnumber 1
case1 0 abc 1 1 case1 1 def 1 1 case2 0 ijk 1 2 case1 1 fff 1 Because the
baseline could be updated, so I add column 'baseline_buildnumber' to mark
which baseline is based on. And accordingly, we need keep multiple version
of baseline be add the buildnumber as primary key: baselinetable:
(buildnumber, casename, resultid) is primary key. buildnumber casename
resultid result 1 case1 0 abc 1 case1 1 def 2 case1 1 fff 1 case2 0 ijk
This solution can remarkably reduce the size of db. But it's complicate:
First, when insert the daily run results, we need to compare the result
with the latest baseline(eg. the 1st,3rd,4th row of the baselinetable
above). Second, if we don't investigate and update the baseline before the
next daily run. It will comes into the following situation: The latest
baseline build is Num1, and the result of build Num2 and build Num3 both
are based on baseline Num1. It's simple to get the Regression Issue of
build Num2. Because the diff result is based on baseline Num1 and Num1 is
also the latest baseline build, so all the diff results of build Num2 are
Regression Issue. But if we update the baseline to build Num2, then we
want to get the Regression Issue of build Num3. We need to compare the
diff result of build Num3 with the baseline Num2, and also need to fetch
the diff between baseline Num1 and baseline Num2, because the
baseline_buildnumber is Num1(which is not the latest baseline build) when
we insert the diff result of build Num3.
Do you guys have any better idea? Thanks very much!
No comments:
Post a Comment