• <ins id="pjuwb"></ins>
    <blockquote id="pjuwb"><pre id="pjuwb"></pre></blockquote>
    <noscript id="pjuwb"></noscript>
          <sup id="pjuwb"><pre id="pjuwb"></pre></sup>
            <dd id="pjuwb"></dd>
            <abbr id="pjuwb"></abbr>

            【轉】Thinking Set-Based .... or not?

            轉自 http://weblogs.sqlteam.com/jeffs/archive/2007/04/30/60192.aspx


            Thinking "Set-Based"

            So, I hear you're a "set-based SQL master"!

            As Yoda once said, you've "unlearned what you have learned". You've trained yourself to attack your database code not from a procedural, step-by-step angle, but rather from the set-based "do it all at once" approach. It may have taken weeks, months or even years to finally obtain this enlightened state of "database zen", but it was worth it. Your SQL code is short, fast, and efficient. There is not a cursor in sight. You have reached the point where you can write a single SELECT that replaces hundreds of lines cursors, temp tables and client-side processing. Life is good.

            As I read somewhere once, you don't tell SQL how to do it, you tell SQL what you want, and that's a great way of thinking about it. A procedural programmer gets bogged down with the details, and has to concentrate on breaking things down into small pieces, explicitly reading and processing one row of data at a time, and figuring out how to combine those results together at the end to make it all work. A set-based SQL programmer worries about none of those things: In the set-based world, you state your relations and join the tables together, add some grouping and criteria, and it is the database engine that worries about the specifics.

            Well, maybe not ... You might not want to abandon all of the things that you learned from your procedural background. There's a danger in misunderstanding that set-based programming means "doing it all at once", and thinking that it forbids processing things "one at a time" or "in steps". Sometimes, when you get too comfortable in the set-based way of thinking, you abandon the good things that you learned as a procedural programmer. The two mindsets aren't as different as you might think!

            Approaching a Problem

            What if I ask you to write a somewhat complicated SELECT, something like this:

            "Write a SELECT that returns, for a given @Year, the total sales by office, and also the office's top salesperson (highest total sales for the year) with their salary (as of the last day of that year), their total bonuses for that year, and their hire date."

            While this isn't rocket science, what makes this request slightly complicated is that it appears there are at least 3 different transactional queries (sales by employee, sales by office, bonus totals by employee) that we need to put all together, as well some point-in-time reporting off of a history table (employee salaries) which can be difficult depending on how the table is structured.

            Now, how does a "set-based" programmer attack this? The schema and the specifics are not important, it is really just the general approach that I am commenting on.

            Do you start by immediately finding all of the necessary tables and put them all into 1 big SELECT by joining everything that matches? Then, from there, you may start adding columns and expressions to your GROUP BY clause, adding in criteria and CASE expressions, maybe a DISTINCT before it all? And then, if that doesn't work, maybe you add some correlated subqueries to your SELECT list, or move things in and out of derived tables? Then more GROUPING, more criteria, more JOINs, more moving things and shifting parts of the SELECT around until it "looks right" and it "seems to work"?

            Well, that does seem to be the set-based approach for many, since you get so trained and so used to thinking of the "big picture", and not worrying about details, that you just assume that you can dive right in and start joining and selecting and eventually you'll get there. We've all done it. That's what you want to do, after all. We don't want to think that we need to break things down into smaller, discrete steps, or that things should be "processed" on step at a time. It goes against everything that we've been trying to train ourselves to do ever since we embraced the concept of relational database programming, right?

            Wrong!

            Thinking in Sets = Thinking in Steps

            It is so important to understand that "thinking set-based" does not conflict with "thinking in steps" !! In fact, it is more important than ever in some ways, especially as your data and your schemas and your requirements become more complex.

            In the above example, if you "dive right in" and start joining and selecting and grouping and seeing how things work, that is exactly the wrong way to do it! You need to remember that the skill you learned from your procedural world -- breaking larger problems down into smaller parts -- still applies even in when writing SQL.

            Looking at the above statement, a really good SQL developer will immediately break the problem down into smaller, completely separate parts:
            a SELECT that returns 1 row per Office, with each Office's total sales for a @Year
            a SELECT that returns 1 row per employee, with their salary as of the last day of a given @Year
            a SELECT that returns 1 row per employee, with their total bonus amount for a given @Year
            a SELECT that returns 1 row per Office, with the top salesperson (Employee) and their sales amount, for a @Year
            Starting with those 4 basic pieces, all of which are completely isolated from the others, is the way to begin to approach the problem. You don't focus on returning employee names, or sorting, or formatting dates -- you focus on the data, and returning it in small parts that will eventually all fit together. For each SELECT, you can test it and optimize it and verify the data, and only at the very end, when all the individual parts are working, do you put them together. This sounds familiar, doesn't it? Much like a procedural programmer who breaks their application down into smaller parts via functions or classes or whatever tools their language provides, I am suggesting that the overall approach is still valid and in fact a great idea even when writing SQL!

            In fact, when writing a SELECT that requires multiple non-related transactional tables this is really the only way to go about solving this problem, since each one must be fully grouped and summarized and ready to join on matching key columns before we can begin to even think about combining the results. In this case, it is only at the very end, when all of our individual SELECTs are grouped by Office or Employee, that we join them together as derived tables.

            In addition, the "step-based" approach involves understanding that things like formatting dates, deciding on how to output a name (first/last or last/first, etc), or sorting is irrelevant to the larger problem. In a complicated select with lots of calculations or point in time reporting, if you can write a select that returns 1 row per employee (determined by the employee's primary key column, let's say EmployeeID), that is all you need; if you know that the Employee table has first name, last name, hire date, and a simple relations to their Department, then don't worry about any of that until the very last step! Just focus on returning a reference to the entity (EmployeeID) and calculating the results or values that you are trying to return per entity (total sales, salary, bonus), and only when everything is accurate and correct should you dress things up with the other attributes of the entity which are trivial to obtain (employee name, hire date) through simple joins.

            Putting it all Together

            In the end, it really does resemble procedural programming quite a bit in that each of these little, self-contained parts, all of which are responsible for doing their job accurately and efficiently, are much like functions or classes. And our primary SELECT is like the main program that calls each of them and in the end puts them all together:

            select OfficeSales.OfficeID,
            OfficeSales.TotalSales as OfficeSales,
            Offices.OfficeName,
            TopSalesPerson.EmployeeID,
            TopSalesPerson.TotalSales as EmployeeSales,
            Employees.EmployeeName,
            Employees.HireDate,
            EmpSalaries.Salary,
            EmpBonus.Bonus,
            from
            ( .... ) OfficeSales
            inner join
            ( .... ) TopSalesPerson on OfficeSales.OfficeID = TopSalesPerson.OfficeID
            inner join
            ( .... ) EmpSalaries on TopSalesPerson.EmployeeID = EmpSalaries.EmployeeID
            inner join
            ( ... ) EmpBonus on TopSalesPerson.EmployeeID = EmpBonus.EmployeeID
            inner join
            Employees on TopSalesPerson.EmployeeID = Employees.EmployeeID
            inner join
            Offices on OfficeSales.OfficeID = Offices.OfficeID

            When all the code is in place, this will probably be a very large, complicated SELECT. But looking at this way, doesn't it look pretty simple? And each of those derived tables, on their own, will also be quite simple. That's the approach we want to take!

            (note: In addition to using derived tables, you can use Common Table Expressions to facilitate this approach, since they work essentially the same way but are often easier to read and incorporate into your complicated SELECT statements. Views and parameterized User Defined Functions can be useful as well. The same concepts still apply -- divide and conquer!)

            Only now, at the very end, do we worry if some of those joins should become LEFT OUTER JOINs, since maybe some employees might not have a bonus for a given year, and so on. Getting the employee's Name and HireDate and the name of each Office is done here, at the very end, where it is very easy and clear since we have just focused on returning the key columns for both of those entities in our derived table results.

            Think again!

            So, the next time you dive right into and start joining and selecting because you know that a "set-based master" doesn't worry about breaking down the details, consider instead becoming a "step-based set-based programmer", and break down your large problem into smaller, easily solvable steps. Even in T-SQL, this is the way to go and it will make your life easier, your code simpler, and often more efficient as well. Don't completely disregard your past experience as you become a relational database programmer, learn how to combine the best of both worlds.

            posted on 2007-09-28 12:42 季陽 閱讀(329) 評論(0)  編輯 收藏 引用

            <2007年9月>
            2627282930311
            2345678
            9101112131415
            16171819202122
            23242526272829
            30123456

            導航

            統計

            常用鏈接

            留言簿(2)

            隨筆檔案(12)

            搜索

            最新隨筆

            最新評論

            閱讀排行榜

            評論排行榜

            久久91精品国产91久久小草| 久久91精品国产91久久麻豆| 久久av免费天堂小草播放| 国产精品一区二区久久精品无码 | A狠狠久久蜜臀婷色中文网| 国产成人精品久久一区二区三区| 国产一级持黄大片99久久| 久久996热精品xxxx| 亚洲综合日韩久久成人AV| 久久久综合九色合综国产| 一本大道久久东京热无码AV| 看久久久久久a级毛片| 久久国产成人亚洲精品影院| 久久人人爽人人爽人人片AV不 | 久久精品国产亚洲AV大全| 婷婷久久综合九色综合98| 2021国内久久精品| 国产精品久久成人影院| 麻豆久久久9性大片| 成人国内精品久久久久影院VR| 亚洲精品国产字幕久久不卡| 精品人妻伦九区久久AAA片69| WWW婷婷AV久久久影片| 国产成人久久精品一区二区三区 | 伊人伊成久久人综合网777| 久久久久国产一级毛片高清版| 一级做a爰片久久毛片免费陪| 国产成人无码精品久久久免费 | 狠狠色丁香婷婷综合久久来来去| 久久综合给合久久狠狠狠97色| 久久久网中文字幕| 国产精品日韩深夜福利久久| 久久99国产精品久久99| 久久精品免费一区二区| 久久人人爽人人人人爽AV| 久久久久久亚洲精品不卡| 精品乱码久久久久久夜夜嗨| 99久久99久久精品国产片| 香蕉久久夜色精品国产小说| 潮喷大喷水系列无码久久精品| 久久精品一本到99热免费|