Posted on 2017-09-29 22:20
路緣 閱讀(1415)
評(píng)論(0) 編輯 收藏 引用 所屬分類:
C/C++
題目很清晰,直接上python代碼。下面的解法是假設(shè)內(nèi)存足夠存儲(chǔ)n個(gè)數(shù)的字典。如果內(nèi)存不夠,我想的解法在時(shí)間復(fù)雜度上不太理想。
如果內(nèi)存不夠,我想到的解法是在下面解法的基礎(chǔ)上,對(duì)dictCounts存儲(chǔ)數(shù)量設(shè)定閾值,刪除出現(xiàn)次數(shù)較少的數(shù)對(duì)應(yīng)的項(xiàng),且要確保刪除該數(shù)在后續(xù)數(shù)列中不再出現(xiàn)或出現(xiàn)的次數(shù)加上其總次數(shù)仍然較少。
這就讓數(shù)據(jù)的遍歷增加了不少。網(wǎng)上也有很多類似該題的問題并給出了解法,有的給出把數(shù)分為很多組,再對(duì)每組數(shù)找出最多的10個(gè)數(shù),然后對(duì)找出的結(jié)果進(jìn)行歸并,
我認(rèn)為其是有漏洞的,如果某個(gè)數(shù)在所有分組中并不占優(yōu),而都有出現(xiàn),其就有可能被漏選掉。
這個(gè)題是我面試時(shí)碰到的,當(dāng)時(shí)想得過于復(fù)雜,還自己給自己挖坑,問數(shù)據(jù)量會(huì)不會(huì)很大,考官說了會(huì)很大,幾百萬,其實(shí)即使幾百萬對(duì)內(nèi)存來講也不是什么問題,又不是幾千億
,當(dāng)時(shí)勉強(qiáng)給了個(gè)解法還有很多紕漏,自己雖然編程多年,這方面的訓(xùn)練還是不夠。寫下來反思。
1 import pandas as pd
2 import copy
3
4 class BenchMark:
5 def __init__(self):
6 self.MIN = 10000
7 self.data = 0
8 def Reset(self):
9 self.MIN = 10000
10 self.data = 0
11
12 dictCounts = {}
13 dictTop10_D2C = {}
14 BENCH_MARK = BenchMark()
15 LAST_BENCH_MARK = BenchMark()
16 run_count1 = 0
17 run_count2 = 0
18
19 def FindTop10(data):
20 global BENCH_MARK, LAST_BENCH_MARK,run_count1,run_count2
21 if(data in dictCounts):
22 dictCounts[data] += 1
23 else:
24 dictCounts[data] = 1
25
26 temp = dictCounts[data]
27
28 #just record run times
29 run_count1 += 1
30
31 if LAST_BENCH_MARK.MIN != 10000 and temp< LAST_BENCH_MARK.MIN:
32 return
33
34 dictTop10_D2C[data] = temp
35
36 if len(dictTop10_D2C)>10:
37 BENCH_MARK.Reset()
38 for item in dictTop10_D2C:
39
40 #just record run times
41 run_count2+=1
42
43 if dictTop10_D2C[item] < BENCH_MARK.MIN:
44 BENCH_MARK.MIN = dictTop10_D2C[item]
45 BENCH_MARK.data = item
46 LAST_BENCH_MARK = copy.deepcopy(BENCH_MARK)
47 dictTop10_D2C.pop(BENCH_MARK.data)
48
49 def PrintData2Count(aDict):
50 for key in aDict:
51 print('%.1f:%d' % (key, aDict[key]))
52
53 if __name__ == '__main__':
54 df = pd.read_csv('D:/data/ctp_data/rb/201709/rb1801_20170905.csv')
55 for data in df['LastPx']:
56 FindTop10(data)
57
58 PrintData2Count(dictCounts)
59 print("==============dictCounts length:", len(dictCounts))
60 PrintData2Count(dictTop10_D2C)
61
62 print("run_count1:%d,run_count2:%d" %(run_count1,run_count2))
63
運(yùn)行結(jié)果如下:
。。。。。。
4121.0:206
4123.0:278
4124.0:180
4122.0:244
4125.0:118
4126.0:34
4127.0:4
4081.0:1366
4080.0:1073
4077.0:1072
4078.0:1091
4079.0:800
4076.0:874
4075.0:886
4074.0:1108
4071.0:719
4073.0:1281
4072.0:1049
4070.0:567
4069.0:442
4068.0:290
4067.0:199
4066.0:204
4065.0:109
4064.0:60
4063.0:80
4062.0:57
4061.0:70
4060.0:70
4059.0:32
4057.0:6
4058.0:22
4129.0:6
4137.0:2
4135.0:2
4133.0:2
==============dictCounts length: 75
4109.0:2080
4108.0:2047
4095.0:3009
4096.0:2785
4094.0:2265
4099.0:2573
4098.0:2702
4097.0:2491
4100.0:2147
4107.0:1809
run_count1:70684,run_count2:19679