如果这篇博客帮助到你,可以请我喝一杯咖啡~
CC BY 4.0 (除特别声明或转载文章外)
{-
给定一个文本文件,构建文件中所有单词及其出现次数的词频表,并将词频表按照格式要求写到一个文件中。
词频表按照单词出现次数从高到低次序排列,出现次数相同的按照字典序排列。第十四周上课提交。
输入文件样例(见text0.txt ):
Mind-controlled Mouse
The three study participants are part of a clinical trial to test a brain-computer interface (BCI) called BrainGate. BrainGate translates participant’s brain activity into commands that a computer can understand. In the new study, researchers first implanted microelectrode arrays into the area of the brain that governs hand movement. The participants trained the system by thinking about moving their hands, something the BCI learned to translate into actions on the screen.
输出文件样例(见text0Fre.txt ):
the:8
a:3
into:3
bci:2
brain:2
braingate:2
of:2
participants:2
study:2
that:2
to:2
about:1
actions:1
activity:1
are:1
area:1
arrays:1
brain-computer:1
by:1
called:1
can:1
clinical:1
commands:1
computer:1
first:1
governs:1
hand:1
hands:1
implanted:1
in:1
interface:1
learned:1
microelectrode:1
mind-controlled:1
mouse:1
movement:1
moving:1
new:1
on:1
part:1
participant:1
researchers:1
screen:1
something:1
system:1
test:1
their:1
thinking:1
three:1
trained:1
translate:1
translates:1
trial:1
understand:1
建议:
main ::IO
main = do
ls <- readFile inputfile
let result = string2listofpairs ls
let formatedResult = formatting result
writeFile formatedResult
其中
string2listofpairs :: String ->[(String, Int)] -- 计算输入串中所有单词及其频率,并排序
formatting :: [(String, Int)] ->String -- 按照要求将词频列表转换为输出格式串
几点注意事项:
1. 大小写不敏感,所有结果用小写字母;
2. It's 视为单词 it;
3. brain-computer 视为一个单词。
-}
import System.IO
import Text.Printf
import Data.Char
inputfile="text0.txt"
outputfile="text0Fre.txt"
main::IO()
main=do
ls<-readFile inputfile
let result = string2listofpairs ls
let formatedResult=formatting result
writeFile outputfile formatedResult
string2listofpairs::String->[(String,Int)]
string2listofpairs s=sortG(cal(group(sortS(words(tolow s)))))
sortS::[String]->[String]
sortS []=[]
sortS (x:xs)=(sortS[y|y<-xs,y<x])++[x]++(sortS[y|y<-xs,x<=y])
cmp::(String,Int)->(String,Int)->Bool
cmp (a,b) (c,d)=
if b/=d
then b>d
else a<c
sortG::[(String,Int)]->[(String,Int)]
sortG []=[]
sortG (x:xs)=(sortG[y|y<-xs,cmp y x])++[x]++(sortG[y|y<-xs,cmp y x==False])
group::[String]->[[String]]
group []=[]
group (x:xs)=[[x]++[y|y<-xs,y==x]]++(group[y|y<-xs,y/=x])
tolow::String->String
tolow []=[]
tolow (x:xs)=
if (isSpace x)||(isAlphaNum x)||(x=='-')
then ((toLower x):(tolow xs))
else tolow xs
cal::[[String]]->[(String,Int)]
cal []=[]
cal (x:xs)=((x!!0,length x):(cal xs))
formatting :: [(String, Int)] ->String -- 按照要求将词频列表转换为输出格式串
formatting []=[]
formatting ((a,b):xs)=(printf "%s:%d\n" a b)++(formatting xs)