Target
Try to parse result XML of Google Complete Suggestion API with Haskell.Environment
- OS
- Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
- GHC
- 7.4.1
Investigate way to achieve this
Access to the API
We can access the API like following URL.Just HTTP GET is okay.
Network.HTTP is useful to do HTTP GET with Haskell.http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=[search words]
import Control.Monad.Trans (liftIO)
import Network.HTTP
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
putStrLn resultxml
This code use following methods.
Prelude> import Network.HTTP
Prelude Network.HTTP> :t Network.HTTP.simpleHTTP
Network.HTTP.simpleHTTP
:: HStream ty =>
Request ty -> IO (Network.Stream.Result (Response ty))
Prelude Network.HTTP> :t Network.HTTP.getRequest
Network.HTTP.getRequest :: String -> Request_String
Prelude Network.HTTP> :t getResponseBody
getResponseBody :: Network.Stream.Result (Response ty) -> IO ty
Prelude Network.HTTP> :t urlEncode
urlEncode :: String -> String
The code’s result is following.
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal sandbox" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal install" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ã¢ãããã¼ã" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal windows" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ä¾åé¢ä¿" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ä½¿ã æ¹" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal ã¢ã³ã¤ã³ã¹ãã¼ã«" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal proxy" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal github" />
</CompleteSuggestion>
</toplevel>
Well, it’s garbled…Following way to fix it is simple.
Just using Codec.Binary.UTF8.String.
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
putStrLn $ S.decodeString resultxml
“decodeString” function’s type is following.
The result is following.Prelude> import Codec.Binary.UTF8.String as S Prelude S> :t S.decodeString S.decodeString :: String -> String
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal sandbox" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal install" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal アップデート" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal windows" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal 依存関係" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal 使い方" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal アンインストール" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal proxy" />
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="haskell cabal github" />
</CompleteSuggestion>
</toplevel>
HTTP GET part was done.
Parse XML
For simplicity, I’ll think following simple XML doc.
<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion>
<suggestion data="haskell cabal" />
</CompleteSuggestion>
</toplevel>
In this part, I’ll use Text.XML.Light module.It provides simple methods. To begin with, I’ll pick up “CompleteSuggestion” element with following code.
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ findChildren (unqual "CompleteSuggestion") root
This codes using following methods.
Prelude> import Text.XML.Light
Prelude Text.XML.Light> :t parseXMLDoc
parseXMLDoc
:: Text.XML.Light.Lexer.XmlSource s => s -> Maybe Element
Prelude Text.XML.Light> parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>"
Just (
Element {
elName = QName {
qName = "toplevel",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "CompleteSuggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "suggestion", qURI = Nothing, qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
],
elLine = Just 1
}
)
],
elLine = Just 1
}
)
Prelude Text.XML.Light> :t unqual
unqual :: String -> QName
Prelude Text.XML.Light> unqual "CompleteSuggestion"
QName {qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing}
Prelude Text.XML.Light> :t findChildren
findChildren :: QName -> Element -> [Element]
The code’s result is following.
[
Element {
elName = QName {
qName = "CompleteSuggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [],
elContent = [
Elem (
Element {
elName = QName {
qName = "suggestion",
qURI = Nothing, qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
],
elLine = Just 1
}
]
Looks success to pick up “CompleteSuggestion” element.
Next, picking up “suggestion” element with following code.
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0)
“findChild” function’s type is following.
The code’s result is following.Prelude Text.XML.Light> :t findChild findChild :: QName -> Element -> Maybe Element
Just (
Element {
elName = QName {
qName = "suggestion",
qURI = Nothing,
qPrefix = Nothing
},
elAttribs = [
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
],
elContent = [],
elLine = Just 1
}
)
Looks success to pick up “suggestion” element. And then, I’ll get attributes of “suggestion” element.
import Text.XML.Light
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
Nothing -> []
Just elem -> elAttribs elem
New function “elAttribs” type is following.
Result is following.Prelude Text.XML.Light> :t elAttribs elAttribs :: Element -> [Attr]
[
Attr {
attrKey = QName {
qName = "data",
qURI = Nothing,
qPrefix = Nothing
},
attrVal = "haskell cabal"
}
]
OK. Looks good. Last step of parsing investigation, Getting “attrVal”.
main = do
case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
Nothing -> putStrLn "Error"
Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
Nothing -> ""
Just elem ->
let
[Attr key val] = elAttribs elem
in
val
The result is following.
"haskell cabal"
Good!
Complete Code
Actually, “CompleteSuggestion” element is not one.So map function is needed like following.
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S
import Data.List
import Text.XML.Light
getText :: Element -> String -> String
getText parent childName =
case findChild (unqual childName) parent of
Nothing -> ""
Just child ->
let
[Attr key val] = elAttribs child
in
val
parseCompleteSuggestionTag :: Element -> String
parseCompleteSuggestionTag element = getText element "suggestion"
main = do
resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
case parseXMLDoc resultxml of
Nothing -> putStrLn "parse error"
Just root ->
putStrLn ("[\"" ++ (Data.List.intercalate "\",\"" $ map S.decodeString $ map (\a -> parseCompleteSuggestionTag a) (findChildren (unqual "CompleteSuggestion") root)) ++ "\"]")
The result is following.
[ "haskell cabal", "haskell cabal sandbox", "haskell cabal install", "haskell cabal アップデート", "haskell cabal windows", "haskell cabal 依存関係", "haskell cabal 使い方", "haskell cabal アンインストール", "haskell cabal proxy", "haskell cabal github" ]
No comments yet.